[engine] Walk references in the ProductGraph
Review Request #3803 — Created May 2, 2016 and submitted — Latest diff uploaded
With caching disabled (will revisit in a future review), the majority of the runtime for
./pants list ::with the new engine was in cycle detection. After attempting to implement a smarter cycle detection algorithm
, it became obvious that the actual cost of walking was the dominant factor.
This review replaces the internals of ProductGraph with
_Entryobjects that directly reference one another. This avoids executing dict lookups (and thus a lot of
Nodeequality checks) during graph walks.
- Add explicit tests of cycle detection.
- Move from multiple defaultdicts to a non-default dict of
_Entryobjects and walk the graph directly by
_Entryreference, which avoids a lot of dict lookups (~20% improvement).
- Switch from recursion to iteration for graph walk (~6% improvement).
- Enable some default filters in
With caching disabled, the runtimes for
time ./pants -q run src/python/pants/engine/exp/legacy:fsnodes -- //:: > /dev/null were:
master: real 0m22.143s user 0m19.913s sys 0m1.914s complex algorithm (with reference walking): real 0m22.874s user 0m20.438s sys 0m2.078s this review (simple algorithm, with reference walking): real 0m17.940s user 0m15.944s sys 0m1.847s
: I was able to get that algorithm solidly working, but in addition to being more complicated, the extra overheads meant that (at least for the content of the
pantsbuild/pants repo) it was significantly slower than the naive
m^2 approach. For posterity, the implementation is tagged here.