Speed up the best-case scenario of dependency resolution.

Review Request #1685 — Created Jan. 28, 2015 and submitted — Latest diff uploaded

ugodiggi
pex
fully_cached_performance_improv
https://github.com/pantsbuild/pex/pull/37
b660f1f...
pants-reviews
patricklaw, wickman
A few small touchups that improve the speed of dependency resolution when
many packages are already in the local cache.

- when dispatching multiple crawling threads:
  - make each thread a bit more reactive by reducing the polling timeout from
    100ms to 10ms
  - do not wait for each of the workers threads to complet, just wait for them to complete their
    workload.
  Without this change, dependency resolution is guaranteed to take longer than 100ms per
  dependency, which is a large amount of time for just checking a local zipfile's content.

- cache the result of a couple calls that are repeated many times:
  - Link.from_filename
  - Package.from_href
  Each of this call is performed for each file in the cache, for each dependency that is resolved.
  While both these calls are not especially expensive, when we repeat them n^2 times in a largish
  local cache * set of dependencies they do add up.

Somewhat unscientific benchmarking on my system show that the average time for resolving a single
dependency (namely 'pytz==2013b') goes down from 150ms to 30ms.

Running the modified code on the urbancompass codebase produced similarly desireable timings.

Run tox (only for py27).

Also, I run some (messy) performance benchmark & collected profiles.
You can check my benchmarks at:
https://github.com/ugodiggi/pex/tree/ugo/bench_hack

Loading...