Review Request #4067 — Created July 12, 2016 and submitted
|jsirois, nhoward_tw, stuhood|
Presently, the pantsd+watchman integration test can occasionally fail in CI with a logged error along the lines of:W0602 21:35:21.260252 22326 pailgun_service.py:69] encountered exception during SchedulerService.get_build_graph(): Traceback (most recent call last): File "/home/travis/build/pantsbuild/pants/src/python/pants/pantsd/service/pailgun_service.py", line 65, in runner_factory build_graph = self._scheduler_service.get_build_graph(spec_roots) File "/home/travis/build/pantsbuild/pants/src/python/pants/pantsd/service/scheduler_service.py", line 84, in get_build_graph return self._graph_helper.create_graph(spec_roots) File "/home/travis/build/pantsbuild/pants/src/python/pants/bin/engine_initializer.py", line 64, in create_graph for _ in graph.inject_specs_closure(spec_roots): # Ensure the entire generator is unrolled. File "/home/travis/build/pantsbuild/pants/src/python/pants/engine/legacy/graph.py", line 169, in inject_specs_closure for address in self._inject(specs): File "/home/travis/build/pantsbuild/pants/src/python/pants/engine/legacy/graph.py", line 185, in _inject entries = maybe_list(state.value, expected_type=LegacyTarget) AttributeError: 'NoneType' object has no attribute 'value'
This seems to only happen when invalidation activity is high (in this case due to constant filesystem event activity in the CI environments), indicating a locking issue around graph construction. I was able to reliably reproduce the flakiness locally using a simple test harness described in #3565 that simulated the fs event conditions seen in CI.
LegacyGraphHelper.create_graph(), which revealed deadlock potential at the time of the Pailgun handler's
fork()in subsequent testing.
+ failed to reproduce locally in a lengthy
whiletest loop using the test harness setup described in #3565.
Revision 2 (+47 -7)
Address Stu's feedback.
Revision 3 (+49 -7)