[engine] Implement symlink handling

Review Request #3691 — Created April 14, 2016 and submitted

3121, 3190
gmalmquist, kwlzn, nhoward_tw, patricklaw, peiyu

In order to properly invalidate for watchman events, we need to record that we walked symlinks in the product graph. This review adds symlink handling to engine.exp.fs, and improves the type safety of filesystem operations by requiring that a PathGlobs object matches exactly one of Files, Dirs or Links.

  • Replace fs.Path with fs.Stat subclasses: fs.{File,Dir,Link}
  • Add explicit support to PathGlobs for matching directories. A PathGlobs object matches either Files or Dirs, but not both.
  • Resolve symlinks by recursively requesting Files/Dirs for a Path projected from a ReadLink
  • Replace RecursiveSubDirectories with use of a PathGlobs object recursively matching Dirs.
  • Add ProjectTree.{lstat, readlink, listdir}
  • Split test_fs.py from test_path_globs.py, and prepare to test ScmProjectTree (see #3189)
  • Include tests in test_fs.py to validate the actual filesystem events that occurred.


  1. Thanks Stu, review looks great, fs operations coming to be better and better.

  2. I'm glad my efforts to make argument to be relpath always was useful.

  3. src/python/pants/engine/exp/fs.py (Diff revision 1)

    What is a reason to introduce this (Files/Dirs/Links) types, do you need them usually separately or together?

    1. This was based on the observation that callers almost always want to consume either files, or directories. There are very few cases where you want to consume both (precisely reproducing a directory structure, is the primary one).

      Additionally, having collections like this is currently necessary to represent filtering operations (similar to a scala Option): converting a Stat to a File may fail, because the Stat may be a directory. This way, returning the collection type indicates that the conversion may return 0 or more File objects (ie, it's Stat->Files).

      I opened https://github.com/pantsbuild/pants/issues/3169 about improving the collections situation to avoid needing the boilerplate collections classes.

  4. src/python/pants/engine/exp/nodes.py (Diff revision 1)

    This makes me feel like FilesystemNode is unnesessary except invalidation part.

    1. The primary thing is that we want the ProjectTree to be private to Node implementations... only classes which understand invalidation should have access to it. So user code should never receive one.

      But I don't think this is the final API, by any means. For one thing, ScmProjectTree shouldn't ever need to invalidate for a particular git-sha, so having it invalidated with watchman is odd.

  5. src/python/pants/engine/exp/storage.py (Diff revision 1)

    256GB > 2^32, I think it's ok, but still.

    1. This is gibibytes (GB). =P

Review request changed

Status: Closed (submitted)

Change Summary:

Merged as f83e2b0272e2ff1c464e453c1f74d9335e18fcc3