[pantsd] Repair watchman startup flakiness.

Review Request #3644 - Created April 3, 2016 and submitted

Kris Wilson
peiyu, stuhood
  • Bump Watchman socket timeout from 1 second to 5 seconds to resolve occasional SocketTimeout on the initial Watchman.watch_project.
  • Remove the prior retry strategy for Watchman.watch_project in favor of higher timeout, making a single 5s SocketTimeout fatal.
  • Add pants.util.retry.retry_on_exception and utilize for re-running the inital Watchman launch command to mitigate a socket race between watchman client and server on startup.
  • Improve ProcessManager.get_subprocess_output to capture stderr and pass it with the raised ExecutionError exception for better logging.
  • Move watchman startup logging from info to debug now that it's launched in the tool runtime path vs post-fork daemon path.
  • Bump pywatchman dep to latest.

All of this combined leads to a 100% success rate in launching Watchman on my machine over the course of ~20+ back-to-back start/stop test runs.

CI is green @ https://travis-ci.org/pantsbuild/pants/builds/121002400

+ tested the race fix under a failure case repro and verified the retry strategy resolves it:

DEBUG] acquiring lock: <pants.process.lock.OwnerPrintingInterProcessFileLock object at 0x1167ac4d0>
DEBUG] launching pantsd
DEBUG] Selected watchman binary bootstrapped to: /Users/kwilson/.cache/pants/bin/watchman/mac/10.11/4.5.0/watchman
DEBUG] launching watchman
DEBUG] watchman cmd is: /Users/kwilson/.cache/pants/bin/watchman/mac/10.11/4.5.0/watchman get-pid --no-save-state --statefile=/Users/kwilson/dev/pants/.pants.d/watchman/watchman.state --sockname=/Users/kwilson/dev/pants/.pants.d/watchman/watchman.sock --logfile=/Users/kwilson/dev/pants/.pants.d/watchman/watchman.log --log-level 2
DEBUG] ensuring creation of directory: /Users/kwilson/dev/pants/.pants.d/watchman
DEBUG] encountered exception on retry #0: ExecutionError(message="Command '[u'/Users/kwilson/.cache/pants/bin/watchman/mac/10.11/4.5.0/watchman', u'get-pid', u'--no-save-state', u'--statefile=/Users/kwilson/dev/pants/.pants.d/watchman/watchman.state', u'--sockname=/Users/kwilson/dev/pants/.pants.d/watchman/watchman.sock', u'--logfile=/Users/kwilson/dev/pants/.pants.d/watchman/watchman.log', u'--log-level', '2']' returned non-zero exit status 1", output='2016-04-02T17:26:45,555: [cli] unable to talk to your watchman on /Users/kwilson/dev/pants/.pants.d/watchman/watchman.sock! (Permission denied)')
DEBUG] watchman is running, pid=1459 socket=/Users/kwilson/dev/pants/.pants.d/watchman/watchman.sock
DEBUG] purging metadata directory: /Users/kwilson/dev/pants/.pids/pantsd
DEBUG] released lock: <pants.process.lock.OwnerPrintingInterProcessFileLock object at 0x1167ac4d0>
DEBUG] pantsd is running at pid 1502
Cache stats: hits=0, misses=1118, total=1118

while monitoring watchman processes in a simultaneous window.

Kris Wilson
Stu Hood
Peiyu Wang
Kris Wilson
Kris Wilson
Review request changed

Status: Closed (submitted)

Change Summary:

thanks Stu & Peiyu! submitted @ 0bff3f9c5b4ceb057c8fc0d0b119e3e431a9af1d