[pantsd] Map filesystem events to ProductGraph invalidation.

Review Request #3629 — Created March 30, 2016 and submitted

kwlzn
pants
kwlzn/pantsd/fs_event_invalidation
3109
pants-reviews
peiyu, stuhood
  • Add ProductGraph.invalidate_files() and supporting pantsd wiring/cleanup.
  • Add an incremental invalidation test case that exposes a bug in invalidation for more complex graphs - and subsequent repairs.
  • Fix spurious error: (9, 'Bad file descriptor') traceback during PailgunService teardown.

https://travis-ci.org/pantsbuild/pants/builds/119452884

+ a good amount of manual testing. a smaller example:

[illuminati pants (kwlzn/pantsd/fs_event_invalidation)]$ ./pants run src/python/pants/engine/exp/legacy:fsnodes -q -- 3rdparty:guava -ldebug --pantsd-fs-event-detection --enable-pantsd
executed 33 nodes in 78 scheduling iterations. there have been 117 total steps for 53 total nodes.
FilesystemNode(subject=Path(path=u'3rdparty'), product=<class 'pants.engine.exp.fs.DirectoryListing'>, variants=None)
FilesystemNode(subject=Path(path=u'3rdparty/BUILD'), product=<class 'pants.engine.exp.fs.FileContent'>, variants=None)
FilesystemNode(subject=SingleAddress(directory=u'3rdparty', name=u'guava'), product=<class 'pants.engine.exp.fs.DirectoryListing'>, variants=None)
DEBUG] acquiring lock: <OwnerPrintingPIDLockFile: u'/Users/kwilson/dev/pants/.pantsd.startup' -- u'/Users/kwilson/dev/pants/.pantsd.startup'>
DEBUG] launching pantsd
DEBUG] Selected watchman binary bootstrapped to: /Users/kwilson/.cache/pants/bin/watchman/mac/10.11/4.5.0/watchman
INFO] launching watchman
DEBUG] watchman cmd is: /Users/kwilson/.cache/pants/bin/watchman/mac/10.11/4.5.0/watchman get-pid --no-save-state --statefile=/Users/kwilson/dev/pants/.pants.d/watchman/watchman.state --sockname=/Users/kwilson/dev/pants/.pants.d/watchman/watchman.sock --logfile=/Users/kwilson/dev/pants/.pants.d/watchman/watchman.log --log-level 2
INFO] watchman is running, pid=2790 socket=/Users/kwilson/dev/pants/.pants.d/watchman/watchman.sock
DEBUG] purging metadata directory: /Users/kwilson/dev/pants/.pids/pantsd
DEBUG] released lock: <OwnerPrintingPIDLockFile: u'/Users/kwilson/dev/pants/.pantsd.startup' -- u'/Users/kwilson/dev/pants/.pantsd.startup'>
DEBUG] pantsd is running at pid 2801
Cache stats: hits=0, misses=111, total=111

[illuminati pants (kwlzn/pantsd/fs_event_invalidation)]$ tail -F .pants.d/pantsd/pantsd.log 
D0330 00:27:13.697853 2801 pants_daemon.py:120] logging initialized
I0330 00:27:13.698453 2801 pants_daemon.py:174] pantsd starting, log level is DEBUG
I0330 00:27:13.700072 2801 pants_daemon.py:135] setting up service <pants.pantsd.service.pailgun_service.PailgunService object at 0x116f76bd0>
I0330 00:27:13.700334 2801 pants_daemon.py:135] setting up service <pants.pantsd.service.fs_event_service.FSEventService object at 0x116f76d50>
I0330 00:27:13.700664 2801 pants_daemon.py:135] setting up service <pants.pantsd.service.scheduler_service.SchedulerService object at 0x116f76e90>
I0330 00:27:13.700999 2801 pants_daemon.py:148] starting service <pants.pantsd.service.scheduler_service.SchedulerService object at 0x116f76e90>
I0330 00:27:13.701394 2801 pants_daemon.py:148] starting service <pants.pantsd.service.pailgun_service.PailgunService object at 0x116f76bd0>
I0330 00:27:13.701663 2801 pailgun_service.py:52] starting pailgun server on port 63401
I0330 00:27:13.701925 2801 pants_daemon.py:148] starting service <pants.pantsd.service.fs_event_service.FSEventService object at 0x116f76d50>
D0330 00:27:14.026108 2801 watchman.py:141] watchman command_list is: [[u'subscribe', '/Users/kwilson/dev/pants', u'all_files', {'fields': [u'name'], 'expression': [u'allof', [u'anyof', [u'type', u'f'], [u'type', u'l']], [u'not', [u'dirname', u'dist', [u'depth', u'eq', 0]]], [u'not', [u'pcre', u'^\\..*', u'wholename']], [u'not', [u'match', u'*.pyc']]]}]]
I0330 00:27:14.035569 2801 watchman.py:147] confirmed watchman subscription: {'subscribe': 'all_files', 'version': '4.5.0', 'clock': 'c:1459322828:2790:1:4'}
I0330 00:27:14.037446 2801 scheduler_service.py:42] enqueuing 4983 changes for subscription all_files
D0330 00:27:14.076474 2801 scheduler_service.py:69] processing 4983 files for subscription all_files (first_event=True)
D0330 00:27:15.040733 2801 fs_event_service.py:134] callback ID 1 for all_files succeeded

(in work window)

[illuminati pants (kwlzn/pantsd/fs_event_invalidation)]$ touch 3rdparty/BUILD 
[illuminati pants (kwlzn/pantsd/fs_event_invalidation)]$

(in tail window)

I0330 00:28:33.734401 2801 scheduler_service.py:42] enqueuing 1 changes for subscription all_files
D0330 00:28:33.735138 2801 fs_event_service.py:134] callback ID 2 for all_files succeeded
D0330 00:28:33.760174 2801 scheduler_service.py:69] processing 1 files for subscription all_files (first_event=False)
D0330 00:28:33.760674 2801 scheduler_service.py:46] handling change event for: ['3rdparty/BUILD']
D0330 00:28:33.761754 2801 scheduler.py:225] generated invalidation subjects: set([DescendantAddresses(directory=u'3rdparty'), Path(path=u'3rdparty/BUILD'), Path(path=u'3rdparty')])
D0330 00:28:33.763807 2801 scheduler.py:207] invalidating node: FilesystemNode(subject=Path(path=u'3rdparty'), product=<class 'pants.engine.exp.fs.DirectoryListing'>, variants=None)
D0330 00:28:33.764301 2801 scheduler.py:207] invalidating node: SelectNode(subject=Path(path=u'3rdparty'), product=<class 'pants.engine.exp.fs.DirectoryListing'>, variants=None, variant_key=None)
D0330 00:28:33.764655 2801 scheduler.py:207] invalidating node: TaskNode(subject=Path(path=u'3rdparty'), product=<class 'pants.engine.exp.graph.BuildFilePaths'>, variants=None, func=filter_buildfile_paths, clause=(Literal(subject=AddressMapper(parser=<class 'pants.engine.exp.legacy.parser.LegacyPythonCallbacksParser'>, symbol_table=<class 'pants.engine.exp.legacy.commands.LegacyTable'>, build_pattern=^BUILD(\.[a-zA-Z0-9_-]+)?$), product=<class 'pants.engine.exp.mapper.AddressMapper'>), Subject(product=<class 'pants.engine.exp.fs.DirectoryListing'>, optional=False))
D0330 00:28:33.765217 2801 scheduler.py:207] invalidating node: SelectNode(subject=Path(path=u'3rdparty'), product=<class 'pants.engine.exp.graph.BuildFilePaths'>, variants=None, variant_key=None)
D0330 00:28:33.765485 2801 scheduler.py:207] invalidating node: ProjectionNode(subject=Path(path=u'3rdparty'), product=<class 'pants.engine.exp.fs.FilesContent'>, variants=None, projected_subject=<class 'pants.engine.exp.fs.Paths'>, fields=(u'paths',), input_product=<class 'pants.engine.exp.graph.BuildFilePaths'>)
D0330 00:28:33.765773 2801 scheduler.py:207] invalidating node: TaskNode(subject=Path(path=u'3rdparty'), product=<class 'pants.engine.exp.mapper.AddressFamily'>, variants=None, func=parse_address_family, clause=(Literal(subject=AddressMapper(parser=<class 'pants.engine.exp.legacy.parser.LegacyPythonCallbacksParser'>, symbol_table=<class 'pants.engine.exp.legacy.commands.LegacyTable'>, build_pattern=^BUILD(\.[a-zA-Z0-9_-]+)?$), product=<class 'pants.engine.exp.mapper.AddressMapper'>), Subject(product=<class 'pants.engine.exp.fs.Path'>, optional=False), Projection(product=<class 'pants.engine.exp.fs.FilesContent'>, projected_subject=<class 'pants.engine.exp.fs.Paths'>, fields=(u'paths',), input_product=<class 'pants.engine.exp.graph.BuildFilePaths'>))
D0330 00:28:33.766099 2801 scheduler.py:207] invalidating node: SelectNode(subject=Path(path=u'3rdparty'), product=<class 'pants.engine.exp.mapper.AddressFamily'>, variants=None, variant_key=None)
D0330 00:28:33.766308 2801 scheduler.py:207] invalidating node: ProjectionNode(subject=3rdparty:guava, product=<class 'pants.engine.exp.mapper.AddressFamily'>, variants=None, projected_subject=<class 'pants.engine.exp.fs.Path'>, fields=(u'spec_path',), input_product=<class 'pants.build_graph.address.Address'>)
D0330 00:28:33.766575 2801 scheduler.py:207] invalidating node: TaskNode(subject=3rdparty:guava, product=<class 'pants.engine.exp.graph.UnhydratedStruct'>, variants=None, func=resolve_unhydrated_struct, clause=(Projection(product=<class 'pants.engine.exp.mapper.AddressFamily'>, projected_subject=<class 'pants.engine.exp.fs.Path'>, fields=(u'spec_path',), input_product=<class 'pants.build_graph.address.Address'>), Subject(product=<class 'pants.build_graph.address.Address'>, optional=False))
D0330 00:28:33.766828 2801 scheduler.py:207] invalidating node: SelectNode(subject=3rdparty:guava, product=<class 'pants.engine.exp.graph.UnhydratedStruct'>, variants=None, variant_key=None)
D0330 00:28:33.767044 2801 scheduler.py:207] invalidating node: DependenciesNode(subject=3rdparty:guava, product=<class 'pants.engine.exp.struct.Struct'>, variants=None, dep_product=<class 'pants.engine.exp.graph.UnhydratedStruct'>, field=None)
D0330 00:28:33.767314 2801 scheduler.py:207] invalidating node: TaskNode(subject=3rdparty:guava, product=<class 'pants.engine.exp.struct.Struct'>, variants=None, func=hydrate_struct, clause=(Subject(product=<class 'pants.engine.exp.graph.UnhydratedStruct'>, optional=False), Dependencies(product=<class 'pants.engine.exp.struct.Struct'>, deps_product=<class 'pants.engine.exp.graph.UnhydratedStruct'>, field=None))
D0330 00:28:33.767618 2801 scheduler.py:207] invalidating node: SelectNode(subject=3rdparty:guava, product=<class 'pants.engine.exp.struct.Struct'>, variants=None, variant_key=None)
D0330 00:28:33.767805 2801 scheduler.py:207] invalidating node: TaskNode(subject=3rdparty:guava, product=<class 'pants.engine.exp.legacy.parser.TargetAdaptor'>, variants=None, func=identity, clause=(Subject(product=<class 'pants.engine.exp.struct.Struct'>, optional=False),)
D0330 00:28:33.768012 2801 scheduler.py:207] invalidating node: SelectNode(subject=3rdparty:guava, product=<class 'pants.engine.exp.legacy.parser.TargetAdaptor'>, variants=None, variant_key=None)
D0330 00:28:33.768198 2801 scheduler.py:207] invalidating node: DependenciesNode(subject=3rdparty:guava, product=<class 'pants.engine.exp.legacy.graph.LegacyBuildGraphNode'>, variants=None, dep_product=<class 'pants.engine.exp.legacy.parser.TargetAdaptor'>, field=None)
D0330 00:28:33.768424 2801 scheduler.py:207] invalidating node: TaskNode(subject=3rdparty:guava, product=<class 'pants.engine.exp.legacy.graph.LegacyBuildGraphNode'>, variants=None, func=reify_legacy_graph, clause=(Subject(product=<class 'pants.engine.exp.legacy.parser.TargetAdaptor'>, optional=False), Dependencies(product=<class 'pants.engine.exp.legacy.graph.LegacyBuildGraphNode'>, deps_product=<class 'pants.engine.exp.legacy.parser.TargetAdaptor'>, field=None))
D0330 00:28:33.768640 2801 scheduler.py:207] invalidating node: SelectNode(subject=3rdparty:guava, product=<class 'pants.engine.exp.legacy.graph.LegacyBuildGraphNode'>, variants=None, variant_key=None)
D0330 00:28:33.768821 2801 scheduler.py:207] invalidating node: DependenciesNode(subject=SingleAddress(directory=u'3rdparty', name=u'guava'), product=<class 'pants.engine.exp.legacy.graph.LegacyBuildGraphNode'>, variants=None, dep_product=<class 'pants.engine.exp.addressable.Addresses'>, field=None)
D0330 00:28:33.769066 2801 scheduler.py:207] invalidating node: ProjectionNode(subject=SingleAddress(directory=u'3rdparty', name=u'guava'), product=<class 'pants.engine.exp.mapper.AddressFamily'>, variants=None, projected_subject=<class 'pants.engine.exp.fs.Path'>, fields=(u'directory',), input_product=<class 'pants.base.specs.SingleAddress'>)
D0330 00:28:33.769264 2801 scheduler.py:207] invalidating node: TaskNode(subject=SingleAddress(directory=u'3rdparty', name=u'guava'), product=<class 'pants.engine.exp.addressable.Addresses'>, variants=None, func=address_from_address_family, clause=(Projection(product=<class 'pants.engine.exp.mapper.AddressFamily'>, projected_subject=<class 'pants.engine.exp.fs.Path'>, fields=(u'directory',), input_product=<class 'pants.base.specs.SingleAddress'>), Subject(product=<class 'pants.base.specs.SingleAddress'>, optional=False))
D0330 00:28:33.769476 2801 scheduler.py:207] invalidating node: SelectNode(subject=SingleAddress(directory=u'3rdparty', name=u'guava'), product=<class 'pants.engine.exp.addressable.Addresses'>, variants=None, variant_key=None)
D0330 00:28:33.769666 2801 scheduler.py:207] invalidating node: FilesystemNode(subject=Path(path=u'3rdparty/BUILD'), product=<class 'pants.engine.exp.fs.FileContent'>, variants=None)
D0330 00:28:33.769844 2801 scheduler.py:207] invalidating node: SelectNode(subject=Path(path=u'3rdparty/BUILD'), product=<class 'pants.engine.exp.fs.FileContent'>, variants=None, variant_key=None)
D0330 00:28:33.770029 2801 scheduler.py:207] invalidating node: DependenciesNode(subject=Paths(dependencies=(Path(path=u'3rdparty/BUILD'),)), product=<class 'pants.engine.exp.fs.FileContent'>, variants=None, dep_product=<class 'pants.engine.exp.fs.Paths'>, field=None)
D0330 00:28:33.770227 2801 scheduler.py:207] invalidating node: TaskNode(subject=Paths(dependencies=(Path(path=u'3rdparty/BUILD'),)), product=<class 'pants.engine.exp.fs.FilesContent'>, variants=None, func=files_content, clause=(Dependencies(product=<class 'pants.engine.exp.fs.FileContent'>, deps_product=<class 'pants.engine.exp.fs.Paths'>, field=None),)
D0330 00:28:33.770437 2801 scheduler.py:207] invalidating node: SelectNode(subject=Paths(dependencies=(Path(path=u'3rdparty/BUILD'),)), product=<class 'pants.engine.exp.fs.FilesContent'>, variants=None, variant_key=None)
I0330 00:28:33.770628 2801 scheduler.py:211] invalidated 27 nodes

tear it down:

[illuminati pants (kwlzn/pantsd/fs_event_invalidation)]$ ./pants clean-all -ldebug --pantsd-fs-event-detection
DEBUG] Executing: git --git-dir=/Users/kwilson/dev/pants/.git --work-tree=/Users/kwilson/dev/pants rev-parse --abbrev-ref HEAD
INFO] Detected git repository at /Users/kwilson/dev/pants on branch kwlzn/pantsd/fs_event_invalidation
DEBUG] Executing: git --git-dir=/Users/kwilson/dev/pants/.git --work-tree=/Users/kwilson/dev/pants rev-parse HEAD
DEBUG] Executing: git --git-dir=/Users/kwilson/dev/pants/.git --work-tree=/Users/kwilson/dev/pants rev-parse --abbrev-ref HEAD

00:32:08 00:00 [main]
               (To run a reporting server: ./pants server)
00:32:08 00:00   [setup]
00:32:08 00:00     [parse]
               Executing tasks in goals: clean-all -> kill-pantsd
00:32:08 00:00   [clean-all]
00:32:08 00:00     [ng-killall]
00:32:08 00:00     [clean-all]
00:32:08 00:00     [kill-pantsd]DEBUG] terminating pantsd
DEBUG] sending signal 15 to pid 2801
DEBUG] successfully terminated pid 2801
DEBUG] purging metadata directory: /Users/kwilson/dev/pants/.pids/pantsd
DEBUG] Selected watchman binary bootstrapped to: /Users/kwilson/.cache/pants/bin/watchman/mac/10.11/4.5.0/watchman
DEBUG] terminating watchman
DEBUG] sending signal 15 to pid 2790
DEBUG] successfully terminated pid 2790
DEBUG] purging metadata directory: /Users/kwilson/dev/pants/.pids/watchman

00:32:09 00:01   [complete]
               SUCCESS
  • 0
  • 0
  • 1
  • 1
  • 2
Description From Last Updated
  1. Thanks Kris!

  2. src/python/pants/engine/exp/legacy/commands.py (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
     

    _open_scheduler should probably return a scheduler and engine instead, and then _open_graph could be implemented in terms of that method?

  3. src/python/pants/engine/exp/legacy/commands.py (Diff revision 1)
     
     
     
     
     
     
     

    Don't need to do it here, but I've been thinking that it would be super useful to move the visualizer into the scheduler, as a method that you could call at any time to dump a graph.

    1. Also, moving/porting the viz-fs endpoint over from examples would be neat... it supports visualizing recursive cat (ie, FileContents) and ls (ie, Paths).

    2. good idea - can follow up with an RB for that.

  4. Not keys anymore.

  5. I don't think DescendantAddresses needs to be invalidated... only objects consumed by FilesystemNode need to be directly invalidated, afaik: everything else should be transitively invalidated.

    To make sure this isn't fragile to additional types, it's important that only types consumed by FilesystemNode are matched... which might mean invalidating based on the dependency of the node?

    1. will take a closer look at this one.

    2. filed #3117 for this and added a TODO as discussed.

  6. We need to match changed directories as well in order to invalidate directory listings.

    1. the case of DirectoryListing invalidation is already covered by the Path(os.path.dirname(filename)) that happens in the invalidation subject generation.

      FWIW I did try turning on matching for directories in Watchman, but this had two key problems that led me to go the above route:

      1) it was SUPER aggressive on events - just a vim 3rdparty/BUILD with no saving is enough to immediately invalidate the entire 3rdparty dirs dirlisting - keying this off of actual file events mapped to their parent dirs seemed considerably more efficient.

      2) it did not cover the base watch-project dir (the build_root), which leaves us with no DirectoryListing invalidation for that without moving the watch-project dir one level down, which could be problematic. the dirname(filename) approach above leaves us with a Path(u'') invalidation subject for the build_root in the case of changing files because os.path.dirname('file') == ''.

      added a quick comment so this stands out more.

  7. 
      
  1. Thanks Kris!
    I really like the way how natural this invalidation looks in the new engine.

    Is ProductGraph a good place for invalidate_files method? I think ProductGraph already expose good enough method to do invalidation work and more particular invalidations should happen in places more specific for particular node type, in this case somewhere near fs.py, what do you think?

    1. I agree that it feels awkward to have invalidate_files, but I don't know of a better place for it. invalidate_native_nodes might make me slightly less uncomfortable, but it's unclear how to make the parameters more generic.

    2. we could do somethign along the lines of:

      from {...}.fs import invalidate_filenames
      ...
      scheduler.invalidate(invalidate_filenames(filenames))
      

      if this makes more sense to folks. I tend to prefer grouping likewise methods at the class level vs having loose util functions around like this, but I'm open to the idea if we collectively feel this is cleaner.

    3. If anything, it's a Node level API. But I don't think it's worth it at this juncture... native nodes / dynamic invalidation aren't supposed to be public APIs, so bending over too far to make them generic is probably not helpful.

    4. sgtm.

    5. For me this code relates more to src/python/pants/pantsd/service/scheduler_service.py that to scheduler itself. Or, as Kris mentioned, {...}.fs even better.
      I don't like to having loose util functions too, but grouping functions on a namespace level is ok for me.

      But will see how it go.

  2. 
      
  1. Thanks!

  2. src/python/pants/engine/exp/scheduler.py (Diff revision 3)
     
     
     

    This will only include completed nodes... to catch both in-progress and completed you'd want len(self._dependents)

  3. The comment explaining the reason for this filter should probably live here (as well). Speaking of which, I don't see the comment elsewhere?

    1. the diff got a little messed up when I rebased, but the comment I added was inline in _generate_fsnode_subjects:

        yield Path(parent_dir_path)  # Invalidate the parent dirs DirectoryListing.
      

      added a larger clarifying comment relating to Watchman here tho - good call.

  4. 
      
Review request changed

Status: Closed (submitted)

Change Summary:

thanks Stu and Timur! submitted @ 160089ce09b27fa3243a792f40087148fca6ee5f

Loading...