[engine] Don't cycle-detect into completed Nodes

Review Request #3848 — Created May 9, 2016 and submitted — Latest diff uploaded

stuhood
pants
3372
pants-reviews
kwlzn, patricklaw

During a cold/complete build we "mostly" add Nodes top-to-bottom, starting from roots and introducing additional dependencies until we bottom out and something is able to begin executing back upward. In this case, our very simple cycle detection algorithm "mostly" hits its best case (walking the transitive dependencies of a Node is easy when it has no dependencies!).

But in the presence of graph invalidation there is another common case, which is the warm/partial case: filling in parent Nodes that were invalidated (transitively through dependents) by a filesystem change lower in the graph. In that case, we're adding Nodes whose dependencies are already mostly completed, potentially with very deep dependency trees. This is the worst case for our cycle detection.

Luckily, there are some simple invariants that we can introduce and preserve which allow us to avoid looking for cycles in completed Nodes, while keeping our simple cycle detection algorithm! For a cold run of ./pants list ::, this improves the time spent in adding dependencies of the cold/complete case by ~40% (7.29 secs to 4.3 secs), and the warm/partial case by ~99% (5.58 secs to 0.02 secs).

  • Add/test invariants to allow us to abort cycle detection when we encounter a completed Node.
  • Remove the unused edge_count field.
  • Remove the sum(..) walk from the debug statement at the end of scheduling... it's the bottleneck for noop build requests.

https://travis-ci.org/pantsbuild/pants/builds/128977643
http://jenkins.pantsbuild.org/job/pantsbuild/job/pants/branch/PR-3372/3/

Loading...