Initial round of pantsd + new engine + watchman integration.

Review Request #3524 — Created March 2, 2016 and submitted

benjyw, ity, patricklaw, peiyu, stuhood
  • Implement a new SchedulerService for managing the online scheduler instance.
  • Revive FSEventService and friends for Watchman integration.
  • Avoid resetting Subsystem options to avoid uninitialized subsystem errors when attempting to use the WatchmanLauncher subsystem in pantsd - this gets stomped on run over run anyhow.
  • Make use of FSEventService and SchedulerService optionable - defaulting to off.
  • Implement a test target in exp to launch pantsd with a copy of the scheduler and listen for file events (currently all this does is log the events - graph invalidation coming soon!).
  • Extend the kill-pantsd goal to also shutdown Watchman when so configured.

CI is away @

while running this in a separate window:

$ tail -F .pants.d/pantsd/pantsd.log .pants.d/watchman/watchman.log

spin up an unconnected pantsd instance using the test target w/ options to enable fs-event-detection:

[illuminati pants (kwlzn/pantsd/engine_integration)]$ ./pants run src/python/pants/engine/exp/legacy:pantsd -q -- src/python/pants/pantsd:: -ldebug --pantsd-fs-event-detection
\*\*\* pantsd launched \*\*\*
DEBUG] acquiring lock: <OwnerPrintingPIDLockFile: u'/Users/kwilson/dev/pants/.pantsd.startup' -- u'/Users/kwilson/dev/pants/.pantsd.startup'>
DEBUG] launching pantsd
DEBUG] purging metadata directory: /Users/kwilson/dev/pants/.pids/pantsd
DEBUG] released lock: <OwnerPrintingPIDLockFile: u'/Users/kwilson/dev/pants/.pantsd.startup' -- u'/Users/kwilson/dev/pants/.pantsd.startup'>
DEBUG] pantsd is running at pid 78563

see log output in the tail window:

D0301 17:45:57.109353 78563] logging initialized
I0301 17:45:57.109931 78563] pantsd starting, log level is DEBUG
I0301 17:45:57.111968 78563] starting service <pants.pantsd.service.scheduler_service.SchedulerService object at 0x106902510>
I0301 17:45:57.112404 78563] starting service <pants.pantsd.service.pailgun_service.PailgunService object at 0x1069021d0>
I0301 17:45:57.112715 78563] starting pailgun server on port 56898
I0301 17:45:57.112966 78563] starting service <pants.pantsd.service.fs_event_service.FSEventService object at 0x106902450>
I0301 17:45:57.116261 78563] watchman is running, pid=70857 socket=/Users/kwilson/dev/pants/.pants.d/watchman/watchman.sock
D0301 17:46:02.122070 78563] watchman command_list is: [[u'subscribe', '/Users/kwilson/dev/pants', u'all_files', {'fields': [u'name'], 'expression': [u'allof', [u'type', u'f'], [u'not', [u'dirname', u'dist', [u'depth', u'eq', 0]]], [u'not', [u'match', u'.*', u'wholename']], [u'not', [u'match', u'*.pyc']]]}]]
I0301 17:46:02.265883 78563] confirmed watchman subscription: {'subscribe': 'all_files', 'version': '3.1.0', 'clock': 'c:1456869416:70857:1:4433'}
I0301 17:46:02.267790 78563] enqueuing 4968 changes for subscription all_files
D0301 17:46:02.292135 78563] processing 4968 files for subscription all_files (first_event=True)
D0301 17:46:03.268616 78563] callback ID 1 for all_files succeeded

save various src files in my editor, which incur new log events:

I0301 17:49:30.558384 78563] enqueuing 1 changes for subscription all_files
D0301 17:49:30.575396 78563] processing 1 files for subscription all_files (first_event=False)
D0301 17:49:30.575889 78563] file src/python/pants/pantsd/service/ changed!
D0301 17:49:31.562772 78563] callback ID 2 for all_files succeeded
I0301 17:49:33.095844 78563] enqueuing 1 changes for subscription all_files
D0301 17:49:33.112721 78563] processing 1 files for subscription all_files (first_event=False)
D0301 17:49:33.113214 78563] file src/python/pants/pantsd/service/ changed!
D0301 17:49:34.098309 78563] callback ID 3 for all_files succeeded

shut it all down with:

[illuminati pants (kwlzn/pantsd/engine_integration)]$ ./pants kill-pantsd -ldebug --pantsd-fs-event-detection
18:03:39 00:00 [main]
               (To run a reporting server: ./pants server)
18:03:39 00:00   [setup]
18:03:39 00:00     [parse]
               Executing tasks in goals: kill-pantsd
18:03:39 00:00   [kill-pantsd]
18:03:39 00:00     [kill-pantsd]DEBUG] terminating pantsd
DEBUG] sending signal 15 to pid 82783
DEBUG] successfully terminated pid 82783
DEBUG] purging metadata directory: /Users/kwilson/dev/pants/.pids/pantsd
DEBUG] terminating watchman
DEBUG] sending signal 15 to pid 82696
DEBUG] successfully terminated pid 82696
DEBUG] purging metadata directory: /Users/kwilson/dev/pants/.pids/watchman

18:03:39 00:00   [complete]
  • 0
  • 0
  • 1
  • 0
  • 1
Description From Last Updated
  2. Related:

    Currently the exp.fs just ignores dotfiles.

    1. added a note pointing to #2956.

  3. Can't remember where we settled, but it seems like from a scalability perspective, having a single watch registered for the whole buildroot (subject to --ignore-patterns would make the most sense). So maybe this method is only useful for tests?

    1. not really - it just already existed and I thought it might be handy in the future. rm'd.

  4. Hm... that bears more explanation. Is this just the pywatchman API in action?

    1. no, the 'truthy results' bit is just the if result: check one line above in the case of a non-exceptional future fetch - the idea being that callbacks could simply return something (e.g. an error msg) instead of raising some arbitrary exception to indicate failure (with the default case being no return or an implicit return None).

      if that seems weird, I'm completely fine with dropping it tho.

    2. Since returning an empty string or empty collection might be a non-error, explicitly looking for None would make more sense to me.

    3. sure, added an explicit is not None check.

  5. Probably capitalize Scheduler... don't think that name is going anywhere (although the class will certainly move).

    Can you link this to a ticket that explains the followup bits to get invalidation happening? I guess?

    1. capitalized and added a link to #2970.

  6. Should this happen in def run instead?

    1. the setup of service<->service interaction all happens prior to any services getting started so they have a chance to interact/register/etc once up front and then run without subsequent changes. FSEventService won't be started at this point or incur live event subscription until services are started.

      when we drop in the HttpService, it'll require this notion of upfront registration - so I thought it made sense to apply it uniformly across the board.

    2. Ok. I think an explicit lifecycle would make more sense then. Constructors having sideeffects smells funny to me.

      1) open
      2) run
      3) close

      ... for example.

    3. sgtm - there's already an explicit run and terminate. added a setup phase so now its setup->run->terminate.

  7. Mark experimental? or is that already obvious from other docs?

    1. I think the line numbers are off in the RB commentary, but assuming you mean the fs-event options - marked them experimental.

  8. Should FSEventService hide its ThreadPoolExecutor dep, and just take a max_workers arg to construct it?

    1. eh - seems to makes more sense for this subsystem to manage construction of Executor pools since there will be at least one more. it used to internalize this in the initial RB - but iirc, you were the one who suggested moving the Executor construction outside for more control (e.g. to hand it an executor of threads/processes/etc).

    2. Haha, irony.

    3. rethinking this, inline with the thinking around lifecycle this probably does make more sense internalized since the service's teardown is what actually terminates this. reverted to internalized setup.

  9. There is strange symmetry here... I don't see this getting launched anywhere in this file.

    ...but now I see that it is probably launched as a singleton via the subsystem_dependencies call. Which begs the question: should Subsystem have a lifecycle?

    1. yeah - the FSEventService service uses WatchmanLauncher to launch watchman. our only entrypoint-with-options back to terminating that running instance is at the Subsystem level.

      in this case, the lifecycle belongs to ProcessManager for both PantsDaemon and Watchman - the Subsystem framework just gives us an options scope.

  10. Needs a TODO.

  1. Ship It!
    1. Some notes though: when I try to reproduce your example above, I get various errors.

      When trying to startup:

      $ ./pants run src/python/pants/engine/exp/legacy:pantsd -q -- src/python/pants/pantsd:: -ldebug --pantsd-fs-event-detection
      $ cat .pants.d/pantsd/pantsd.log
      D0302 11:38:24.090219 26755] logging initialized
      I0302 11:38:24.091131 26755] pantsd starting, log level is DEBUG
      I0302 11:38:24.093492 26755] starting service <pants.pantsd.service.scheduler_service.SchedulerService object at 0x1043050d0>
      I0302 11:38:24.094182 26755] starting service <pants.pantsd.service.pailgun_service.PailgunService object at 0x1042f1d50>
      I0302 11:38:24.094588 26755] starting pailgun server on port 61665
      I0302 11:38:24.094798 26755] starting service <pants.pantsd.service.fs_event_service.FSEventService object at 0x1042f1fd0>
      W0302 11:38:24.098351 26755] Exception in thread Thread-5:
      W0302 11:38:24.098560 26755] Traceback (most recent call last):
      W0302 11:38:24.098731 26755]   File "/opt/twitter_mde/package/python2.7/current/lib/python2.7/", line 810, in __bootstrap_inner
      W0302 11:38:24.098922 26755]
      W0302 11:38:24.099121 26755]   File "/opt/twitter_mde/package/python2.7/current/lib/python2.7/", line 763, in run
      W0302 11:38:24.099281 26755]     self.__target(*self.__args, **self.__kwargs)
      W0302 11:38:24.099406 26755]   File "/Users/stuhood/src/pants/src/python/pants/pantsd/service/", line 88, in run
      W0302 11:38:24.099567 26755]     watchman = WatchmanLauncher.global_instance().maybe_launch()
      W0302 11:38:24.099714 26755]   File "/Users/stuhood/src/pants/src/python/pants/pantsd/subsystem/", line 47, in maybe_launch
      W0302 11:38:24.099850 26755]     if not self.watchman.is_alive():
      W0302 11:38:24.099973 26755]   File "/Users/stuhood/src/pants/src/python/pants/pantsd/subsystem/", line 43, in watchman
      W0302 11:38:24.100112 26755]     watchman_path=self._watchman_path)
      W0302 11:38:24.100233 26755]   File "/Users/stuhood/src/pants/src/python/pants/pantsd/", line 31, in __init__
      W0302 11:38:24.100352 26755]     self.watchman_path = self._resolve_watchman_path(watchman_path)
      W0302 11:38:24.100471 26755]   File "/Users/stuhood/src/pants/src/python/pants/pantsd/", line 67, in _resolve_watchman_path
      W0302 11:38:24.100605 26755]     raise self.ExecutionError('could not locate watchman in $PATH!')
      W0302 11:38:24.100725 26755] ExecutionError: could not locate watchman in $PATH!
      I0302 11:38:26.098202 26755] terminating pantsd service: <pants.pantsd.service.scheduler_service.SchedulerService object at 0x1043050d0>
      I0302 11:38:27.096240 26755] terminating pantsd service: <pants.pantsd.service.pailgun_service.PailgunService object at 0x1042f1d50>
      W0302 11:38:27.097388 26755] Exception in thread Thread-4:
      W0302 11:38:27.097577 26755] Traceback (most recent call last):
      W0302 11:38:27.097726 26755]   File "/opt/twitter_mde/package/python2.7/current/lib/python2.7/", line 810, in __bootstrap_inner
      W0302 11:38:27.097870 26755]
      W0302 11:38:27.098009 26755]   File "/opt/twitter_mde/package/python2.7/current/lib/python2.7/", line 763, in run
      W0302 11:38:27.098150 26755]     self.__target(*self.__args, **self.__kwargs)
      W0302 11:38:27.098288 26755]   File "/Users/stuhood/src/pants/src/python/pants/pantsd/service/", line 55, in run
      W0302 11:38:27.098424 26755]     self.pailgun.handle_request()
      W0302 11:38:27.098560 26755]   File "/opt/twitter_mde/package/python2.7/current/lib/python2.7/", line 276, in handle_request
      W0302 11:38:27.098696 26755]     fd_sets = _eintr_retry(, [self], [], [], timeout)
      W0302 11:38:27.098846 26755]   File "/opt/twitter_mde/package/python2.7/current/lib/python2.7/", line 155, in _eintr_retry
      W0302 11:38:27.098988 26755]     return func(*args)
      W0302 11:38:27.099123 26755] error: (9, 'Bad file descriptor')
      I0302 11:38:27.099322 26755] terminating pantsd service: <pants.pantsd.service.fs_event_service.FSEventService object at 0x1042f1fd0>
      I0302 11:38:27.099531 26755] shutting down threadpool
      I0302 11:38:27.099934 26755] terminating pantsd
      F0302 11:38:27.100698 26755] Traceback (most recent call last):
        File "/Users/stuhood/src/pants/src/python/pants/pantsd/", line 401, in daemonize
          self.post_fork_child(**post_fork_child_opts or {})
        File "/Users/stuhood/src/pants/src/python/pants/pantsd/", line 193, in post_fork_child
        File "/Users/stuhood/src/pants/src/python/pants/pantsd/", line 182, in _run
        File "/Users/stuhood/src/pants/src/python/pants/pantsd/", line 156, in _run_services
          raise self.RuntimeFailure('service failure for {}, shutting down!'.format(service))
      RuntimeFailure: service failure for <pants.pantsd.service.fs_event_service.FSEventService object at 0x1042f1fd0>, shutting down!

      When trying to kill (although it does seem to successfully kill):

      $ ./pants kill-pantsd -ldebug --pantsd-fs-event-detection
      File "/Users/stuhood/src/pants/src/python/pants/pantsd/", line 67, in _resolve_watchman_path
        raise self.ExecutionError('could not locate watchman in $PATH!')

      Both seem to be because I don't have watchman installed; the plan is to ship a private copy, yea? Can do with binary_utils if need be.

    2. yeah - you either need to add the srcgit/mde watchman to your $PATH or brew install watchman at the moment. I'll get an RB out that uses binary_utils et al soon.

  2. and self._scheduler ?

    1. good call - added!

Review request changed

Status: Closed (submitted)

Change Summary:

thanks gents! submitted @ e28fe8614a849d0de90af5b6844cf47614a2bb27