[engine] Walk references in the ProductGraph
Review Request #3803 — Created May 2, 2016 and submitted — Latest diff uploaded
Information | |
---|---|
stuhood | |
pants | |
3315 | |
Reviewers | |
pants-reviews | |
kwlzn, patricklaw |
With caching disabled (will revisit in a future review), the majority of the runtime for
./pants list ::
with the new engine was in cycle detection. After attempting to implement a smarter cycle detection algorithm[0]
, it became obvious that the actual cost of walking was the dominant factor.This review replaces the internals of ProductGraph with
_Entry
objects that directly reference one another. This avoids executing dict lookups (and thus a lot ofNode
equality checks) during graph walks.
- Add explicit tests of cycle detection.
- Move from multiple defaultdicts to a non-default dict of
_Entry
objects and walk the graph directly by_Entry
reference, which avoids a lot of dict lookups (~20% improvement). - Switch from recursion to iteration for graph walk (~6% improvement).
- Enable some default filters in
legacy:commands
forvenv
directories.
With caching disabled, the runtimes for time ./pants -q run src/python/pants/engine/exp/legacy:fsnodes -- //:: > /dev/null
were:
master: real 0m22.143s user 0m19.913s sys 0m1.914s complex algorithm (with reference walking): real 0m22.874s user 0m20.438s sys 0m2.078s this review (simple algorithm, with reference walking): real 0m17.940s user 0m15.944s sys 0m1.847s
[0]
: I was able to get that algorithm solidly working, but in addition to being more complicated, the extra overheads meant that (at least for the content of the pantsbuild/pants
repo) it was significantly slower than the naive m^2
approach. For posterity, the implementation is tagged here.
https://travis-ci.org/pantsbuild/pants/builds/127680443