[pantsd] Build graph caching via v2 engine integration.

Review Request #3798 — Created April 30, 2016 and submitted

kwlzn
pants
3227, 3307
3800
pants-reviews
benjyw, jsirois, peiyu, stuhood

This implements the initial round of build graph caching in pailgun-based runs via integration of the v2 engine.

A sibling review will follow to land in lockstep with this RB to move all dependent objects across the pants.engine.exp boundary and excise any exp-aligned naming.

  • Migrate the v2 engine setup guts of commands.py into goal_runner.py bundled as EngineInitializer to bootstrap PantsDaemonLauncher and the setup of SchedulerService without forming a graph dependency cycle. Integrate this in the current commands.py and tests for coverage.
  • Plumb a path for BuildGraph reuse in pailgun runs (requires --enable-pantsd).
  • Add syncronous, scheduler-backed BuildGraph construction in-daemon (pre-fork) with fallback to legacy graph usage on exception.
  • Factor out a global state scrubber util for use in the pantsd startup and pailgun runs to resolve TODO.
  • Inline the import of LocalPantsRunner in the mainline runner path for a ~100ms speedup in the thin client case.
  • Add .pantsd.startup to .gitignore since the new interprocess lock does not clean up lockfiles.
  • Misc cleanups and optimizations.

https://travis-ci.org/pantsbuild/pants/builds/126936108

+ considerable local testing:

[window B] setup tail:

[illuminati pants (kwlzn/engine/closed_loop_runner)]$ tail -F .pants.d/pantsd/pantsd.log

[window A] env var export + throwaway run to launch pantsd:

[illuminati pants (kwlzn/engine/closed_loop_runner)]$ export PANTS_ENABLE_PANTSD=true
[illuminati pants (kwlzn/engine/closed_loop_runner)]$ export PANTS_PANTSD_FS_EVENT_DETECTION=true
[illuminati pants (kwlzn/engine/closed_loop_runner)]$ ./pants -ldebug help-advanced > /dev/null
DEBUG] acquiring lock: <pants.process.lock.OwnerPrintingInterProcessFileLock object at 0x10af64c10>
DEBUG] launching pantsd
DEBUG] Selected watchman binary bootstrapped to: /Users/kwilson/.cache/pants/bin/watchman/mac/10.11/4.5.0/watchman
DEBUG] launching watchman
DEBUG] watchman cmd is: /Users/kwilson/.cache/pants/bin/watchman/mac/10.11/4.5.0/watchman get-pid --no-save-state --statefile=/Users/kwilson/dev/pants/.pants.d/watchman/watchman.state --sockname=/Users/kwilson/dev/pants/.pants.d/watchman/watchman.sock --logfile=/Users/kwilson/dev/pants/.pants.d/watchman/watchman.log --log-level 2
DEBUG] ensuring creation of directory: /Users/kwilson/dev/pants/.pants.d/watchman
DEBUG] watchman is running, pid=71075 socket=/Users/kwilson/dev/pants/.pants.d/watchman/watchman.sock
DEBUG] purging metadata directory: /Users/kwilson/dev/pants/.pids/pantsd
DEBUG] forking <pants.pantsd.pants_daemon.PantsDaemon object at 0x10af64c90>
DEBUG] released lock: <pants.process.lock.OwnerPrintingInterProcessFileLock object at 0x10af64c10>
DEBUG] pantsd is running at pid 71081
[illuminati pants (kwlzn/engine/closed_loop_runner)]$

[window B] output:

D0430 14:51:10.843097 71081 pants_daemon.py:120] logging initialized
I0430 14:51:10.843727 71081 pants_daemon.py:163] pantsd starting, log level is DEBUG
I0430 14:51:10.845850 71081 pants_daemon.py:124] setting up service <pants.pantsd.service.fs_event_service.FSEventService object at 0x10aeb9d50>
I0430 14:51:10.846252 71081 pants_daemon.py:124] setting up service <pants.pantsd.service.scheduler_service.SchedulerService object at 0x10aeeb2d0>
I0430 14:51:10.846585 71081 pants_daemon.py:124] setting up service <pants.pantsd.service.pailgun_service.PailgunService object at 0x410b360290>
I0430 14:51:10.846873 71081 pants_daemon.py:137] starting service <pants.pantsd.service.pailgun_service.PailgunService object at 0x410b360290>
I0430 14:51:10.847280 71081 pailgun_service.py:64] starting pailgun server on port 53442
I0430 14:51:10.847786 71081 pants_daemon.py:137] starting service <pants.pantsd.service.fs_event_service.FSEventService object at 0x10aeb9d50>
I0430 14:51:10.848162 71081 pants_daemon.py:137] starting service <pants.pantsd.service.scheduler_service.SchedulerService object at 0x10aeeb2d0>
D0430 14:51:11.041810 71081 watchman.py:147] watchman command_list is: [[u'subscribe', '/Users/kwilson/dev/pants', u'all_files', {'fields': [u'name'], 'expression': [u'allof', [u'anyof', [u'type', u'f'], [u'type', u'l']], [u'not', [u'dirname', u'dist', [u'depth', u'eq', 0]]], [u'not', [u'pcre', u'^\\..*', u'wholename']], [u'not', [u'match', u'*.pyc']]]}]]
I0430 14:51:11.047462 71081 watchman.py:153] confirmed watchman subscription: {'subscribe': 'all_files', 'version': '4.5.0', 'clock': 'c:1462053065:71075:1:4'}
I0430 14:51:11.049132 71081 scheduler_service.py:54] enqueuing 4552 changes for subscription all_files
D0430 14:51:11.069974 71081 scheduler_service.py:81] processing 4552 files for subscription all_files (first_event=True)
D0430 14:51:16.052555 71081 fs_event_service.py:146] callback ID 1 for all_files succeeded

[window A] initial cold run to prime the resident scheduler:

[illuminati pants (kwlzn/engine/closed_loop_runner)]$ time ./pants list 3rdparty:: > /dev/null

real  0m1.927s
user  0m0.129s
sys 0m0.067s

[window B] pantsd log output:

I0430 14:53:04.627880 71081 pailgun_server.py:72] handling pailgun request: `./pants list 3rdparty::`
D0430 14:53:04.628139 71081 pailgun_server.py:73] pailgun request environment: ...
D0430 14:53:04.628923 71081 pailgun_service.py:51] requesting BuildGraph from <pants.pantsd.service.scheduler_service.SchedulerService object at 0x10aeeb2d0>
D0430 14:53:04.629442 71081 scheduler_service.py:92] execution commandline: [u'./pants', 'list', '3rdparty::']
D0430 14:53:04.676603 71081 scheduler_service.py:94] parsed spec_roots: [DescendantAddresses(directory='3rdparty')]
D0430 14:53:04.677037 71081 graph.py:193] Injecting to <pants.engine.exp.legacy.graph.ExpGraph object at 0x1090639d0>: [DescendantAddresses(directory='3rdparty')]
D0430 14:53:06.114860 71081 scheduler.py:614] visited 1219 nodes in 344 scheduling iterations. there have been 4022 total steps for 1987 total nodes.
D0430 14:53:06.123577 71081 scheduler_service.py:97] engine cache stats: hits=0, misses=3976, total=3976
D0430 14:53:06.123779 71081 scheduler_service.py:98] build_graph is: <pants.engine.exp.legacy.graph.ExpGraph object at 0x1090639d0>
D0430 14:53:06.124270 71081 process_manager.py:178] purging metadata directory: /Users/kwilson/dev/pants/.pids/pantsd-run-2016-04-30T14_53_06_124079
D0430 14:53:06.124444 71081 process_manager.py:402] forking <pants.bin.daemon_pants_runner.DaemonPantsRunner object at 0x410b360450>

[window A] warm run:

[illuminati pants (kwlzn/engine/closed_loop_runner)]$ time ./pants list 3rdparty:: > /dev/null

real  0m0.609s
user  0m0.130s
sys 0m0.066s

[window B] pantsd log output:

I0430 14:53:40.948565 71081 pailgun_server.py:72] handling pailgun request: `./pants list 3rdparty::`
D0430 14:53:40.949038 71081 pailgun_server.py:73] pailgun request environment: ...
D0430 14:53:40.949594 71081 pailgun_service.py:51] requesting BuildGraph from <pants.pantsd.service.scheduler_service.SchedulerService object at 0x10aeeb2d0>
D0430 14:53:40.950035 71081 scheduler_service.py:92] execution commandline: [u'./pants', 'list', '3rdparty::']
D0430 14:53:40.992217 71081 scheduler_service.py:94] parsed spec_roots: [DescendantAddresses(directory='3rdparty')]
D0430 14:53:41.005194 71081 graph.py:193] Injecting to <pants.engine.exp.legacy.graph.ExpGraph object at 0x410b6d41d0>: [DescendantAddresses(directory='3rdparty')]
D0430 14:53:41.015544 71081 scheduler.py:614] visited 1219 nodes in 0 scheduling iterations. there have been 4022 total steps for 1987 total nodes.
D0430 14:53:41.022319 71081 scheduler_service.py:97] engine cache stats: hits=0, misses=3976, total=3976
D0430 14:53:41.022509 71081 scheduler_service.py:98] build_graph is: <pants.engine.exp.legacy.graph.ExpGraph object at 0x410b6d41d0>
D0430 14:53:41.022916 71081 process_manager.py:178] purging metadata directory: /Users/kwilson/dev/pants/.pids/pantsd-run-2016-04-30T14_53_41_022726
D0430 14:53:41.023076 71081 process_manager.py:402] forking <pants.bin.daemon_pants_runner.DaemonPantsRunner object at 0x10af3d210>

[window A] testing invalidation:

[illuminati pants (kwlzn/engine/closed_loop_runner)]$ find 3rdparty -name BUILD -exec touch {} \;
[illuminati pants (kwlzn/engine/closed_loop_runner)]$

[window B] pantsd log output:

...
I0430 14:57:12.536189 71081 scheduler.py:180] invalidated 901 of 1086 nodes
...

[window A] post-invalidation re-run:

[illuminati pants (kwlzn/engine/closed_loop_runner)]$ time ./pants list 3rdparty:: > /dev/null

real  0m1.138s
user  0m0.128s
sys 0m0.066s

[window B] pantsd log output:

I0430 14:59:42.952877 71081 pailgun_server.py:72] handling pailgun request: `./pants list 3rdparty::`
D0430 14:59:42.953142 71081 pailgun_server.py:73] pailgun request environment: ...
D0430 14:59:42.953516 71081 pailgun_service.py:51] requesting BuildGraph from <pants.pantsd.service.scheduler_service.SchedulerService object at 0x10aeeb2d0>
D0430 14:59:42.953715 71081 scheduler_service.py:92] execution commandline: [u'./pants', 'list', '3rdparty::']
D0430 14:59:42.998142 71081 scheduler_service.py:94] parsed spec_roots: [DescendantAddresses(directory='3rdparty')]
D0430 14:59:43.001378 71081 graph.py:193] Injecting to <pants.engine.exp.legacy.graph.ExpGraph object at 0x410b4ff490>: [DescendantAddresses(directory='3rdparty')]
D0430 14:59:43.551987 71081 scheduler.py:614] visited 1219 nodes in 172 scheduling iterations. there have been 6172 total steps for 1987 total nodes.
D0430 14:59:43.562912 71081 scheduler_service.py:97] engine cache stats: hits=2127, misses=3976, total=6103
D0430 14:59:43.563154 71081 scheduler_service.py:98] build_graph is: <pants.engine.exp.legacy.graph.ExpGraph object at 0x410b4ff490>
D0430 14:59:43.563565 71081 process_manager.py:178] purging metadata directory: /Users/kwilson/dev/pants/.pids/pantsd-run-2016-04-30T14_59_43_563377
D0430 14:59:43.563733 71081 process_manager.py:402] forking <pants.bin.daemon_pants_runner.DaemonPantsRunner object at 0x10a7bf650>
  • 0
  • 0
  • 5
  • 4
  • 9
Description From Last Updated
KW
KW
KW
ST
  1. 
      
  2. Not clear what "primed" means in this case.

    1. removed that wording.

  3. src/python/pants/bin/goal_runner.py (Diff revision 2)
     
     
     

    Might be good to define these two classes in their own file within this directory. There is a lot going on here already.

    1. yeah - I initially went the opposite direction here (e.g. more monolithic) effectively due to a known cycle between GoalRunner->EngineInitializer->OptionsInitializer->GoalRunner that this colocation simply resolved.

      I've gone ahead and factored all of the Initializers out at the expense of having to inline an import to break up the cycle - with a TODO to revisit.

  4. src/python/pants/bin/goal_runner.py (Diff revision 2)
     
     
     

    Ideally, this should come from the pants-ignore option that Yujie introduced.

  5. src/python/pants/bin/goal_runner.py (Diff revision 2)
     
     

    This should probably indicate that it is for testing/non-daemon runs only.

  6. src/python/pants/bin/goal_runner.py (Diff revision 2)
     
     

    When calling methods with fully defaulted params, should probably include the param names explicitly.

  7. ditto

    1. removed this wording.

  8. It doesn't seem like this needs to happen inside the scheduler service. Is there any reason not to have the client do it?

    1. makes sense - moved this into the Pailgun's runner factory.

  9. src/python/pants/pantsd/service/scheduler_service.py (Diff revision 2)
     
     
     
     
     

    Only inject_specs_closure really needs to occur under the lock... the rest should be able to move out.

    Also, graph.get_target doesn't invoke the engine, so that step isn't necessary.

    1. ah, for some reason I assumed graph.get_target was necessary due to a lazy path somewhere - but you're right. both fixed.

  10. Storage also has a life cycle. Unfortunately, many of the unit tests are not obeying it right now... but we should here is possible.

    1. I think engine.close() takes care of storage

        def close(self):
          """Shutdown this engine instance, releasing resources it was using."""
          self._storage.close()
          self._cache.close()
      
    2. Ok, thanks. It probably shouldn't, as that is asymmetrical... the Storage is opened independently of the Engine, so the Engine shouldn't be closing it.

    3. agree this should probably become symmetrical - but will punt on that for the purposes of this review given this is handled in current state.

  11. src/python/pants/pantsd/util.py (Diff revision 2)
     
     
     

    The description indicates that this resolves a TODO? Would be good to mention that here.

    1. the TODO in question was: https://github.com/pantsbuild/pants/blob/master/src/python/pants/pantsd/pants_daemon.py#L124-L125

      but was removed as part of resolving it - not sure why we'd mention that here?

  12. 
      
JS
  1. 
      
  2. src/python/pants/bin/goal_runner.py (Diff revision 2)
     
     
    It looks like your rubric may be public for methods that are used by other modules ... including tests.  Tests aside, I think this is _private.  A wider agreement should probably be reached and this review not be blocked, but it would be great to see new types generally (almost always) have documented public members, with all other members _private/_visible_for_testing/_module_private.
    1. in this case, the reason for making this public is due to way its referenced in usage by pants_daemon_launcher.py - i.e. this is directly accessed outside of its module to pass into PailgunService for the purposes of cmdline parsing in pailgun runs. this is organized as-is currently primarily to avoid having outbound dependencies from pailgun_service.py to things like the OptionsInitializer for cycle avoidance - otherwise I'd agree that it'd make more sense as a private method on something like PantsDaemonRunner. I do plan to circle back to further refine some of this tho - so expect iterative improvements.

      open to other ideas as well in the meantime.

  3. src/python/pants/bin/goal_runner.py (Diff revision 2)
     
     
    This one needs param doc.
  4. src/python/pants/bin/goal_runner.py (Diff revision 2)
     
     

    It would be great if this read instead:

    pantsd_launcher = PantsDaemonLauncher.Factory.create(EngineInitializer)
    ...
    

    IOW, if use of PantsDaemonLauncher as a plain old object was possible and dependence on options was hidden behind a factory dedicated to them (and registering them) and the launcher could be created in whole at once and not mutated after the fact.

    1. totally on board with this idea - moved to a factory.

  5. If the pailgun is active then this seems more severe - warning at least.  Right now this just debug logs and otherwise silently falls back to a legacy graph IIUC.
    1. good call, fixed.

  6. I think you missed %s discipline here for faster noop when log > debug.
  7. Bit of a smell.  Might be nice to circle back and figure out a name for this since EngineInitializer and SchedulerService speak the same language in spirit here.
    1. I see what you mean. added a TODO for this.

  8. 
      
PE
  1. 
      
  2. document the param

  3. different from the rest of pants code how logger is declared, is there a reason? like to inject a different logger?

    1. much of the pantsd code uses class-local loggers due to early issues around logging, global state and forking. I should probably revisit this as a cleanup exercise and only use this style in the places it's strictly necessary.

  4. 
      
KW
KW
KW
ST
  1. Thanks Kris!

    1. My shipit stands, but one comment: it looks like ./pants server is not reporting any runs that are launched from the daemon.

    2. ...oh, correction! It is, but it reports them under the original cmdline args that forked the daemon.

    3. ah, will take a look.

  2. xx

  3. 
      
KW
KW
PA
  1. 
      
  2. What's the point of the all() here? Seems like a smell. If it's intended to have side effects, a loop is preferable.

    1. the point is to fully exercise the generator in the most efficient way - sort of like doing list(generator) without the overhead of constructing the list and storing the yielded values.

      afaict, its pretty idiomatic and is the known 'fastest' method for exhausting generators in this fashion:

      [illuminati pants (kwlzn/engine/closed_loop_runner)]$ python -m timeit -n 1000 "all(x for x in xrange(10000))"
      1000 loops, best of 3: 0.621 usec per loop
      [illuminati pants (kwlzn/engine/closed_loop_runner)]$ python -m timeit -n 1000 "for x in xrange(10000): pass"
      1000 loops, best of 3: 125 usec per loop
      

      what's the smell?

    2. As discussed in chat, the for loop is in fact faster, in addition to being clearer imo

    3. fixed.

  3. Do we have a style policy on using %s formatting within the context of logging statements?

    1. I don't think anything has been formalized - our styleguide is actually quite bare.

      this logging pattern in general tho certainly seems to deserve an exception for perf, imho - tho it might be good to implement our own logging helper that could lazily evaluate the .format string style for consistency. fwiw, I directly observed at least one case where we were needlessly formatting massively large strings in debug log entries to our own detriment - it would probably serve us well to try to optimize this pattern unilaterally across the codebase.

    2. sgtm

  4. Strongly prefer keyword-only argument passing here, given the number of args.

    1. I suppose - tho worth noting that liberal use of **kwargs adds a cost of 1 dict creation for every call, which is generally why I tend to avoid it where possible. shouldn't matter here tho - fixed.

  5. Seems like this should be returning a namedtuple or something more structured. A 4-tuple pattern match is tough on the eyes.

    1. resolved the TODO here.

    1. fixed, but I genuinely don't understand why this one was flagged as an issue - it's 5 params and fairly simple as-is. forcing the use of kwargs in this case is actually hard on my eyes!

      whats your heuristic for when to jump to kwargs? or better yet, could you doc some guidance in the style guide so it can be globally enforced? seems like something that could greatly benefit from broader consensus.

    1. fixed, see above.

  6. The ** vs or binding is hard on the eyes. How about options = options or {} as the first line of this fn

  7. 
      
KW
BE
  1. 
      
  2. pants.ini (Diff revision 7)
     
     

    Which reminds me that it's annoying that build-support has both checked-in files and generated stuff. We should sort that out at some point.

  3. Why all the switches from format to %s?

    1. The , operator vs % operator is important here.  This is _not_ string formatting now, its log formatting later.  3 people have commented on this now!
  4. 
      
JS
  1. Ship It!
  2. 
      
KW
ST
  1. Ship It!
  2. 
      
KW
Review request changed

Status: Closed (submitted)

Change Summary:

thanks folks! this is submitted @ e6c510fd3110d4c00004481d06e70e956d1415cf

Loading...