Speed up the best-case scenario of dependency resolution.

Review Request #1685 — Created Jan. 29, 2015 and submitted

ugodiggi
pex
fully_cached_performance_improv
https://github.com/pantsbuild/pex/pull/37
b660f1f...
pants-reviews
patricklaw, wickman
A few small touchups that improve the speed of dependency resolution when
many packages are already in the local cache.

- when dispatching multiple crawling threads:
  - make each thread a bit more reactive by reducing the polling timeout from
    100ms to 10ms
  - do not wait for each of the workers threads to complet, just wait for them to complete their
    workload.
  Without this change, dependency resolution is guaranteed to take longer than 100ms per
  dependency, which is a large amount of time for just checking a local zipfile's content.

- cache the result of a couple calls that are repeated many times:
  - Link.from_filename
  - Package.from_href
  Each of this call is performed for each file in the cache, for each dependency that is resolved.
  While both these calls are not especially expensive, when we repeat them n^2 times in a largish
  local cache * set of dependencies they do add up.

Somewhat unscientific benchmarking on my system show that the average time for resolving a single
dependency (namely 'pytz==2013b') goes down from 150ms to 30ms.

Running the modified code on the urbancompass codebase produced similarly desireable timings.

Run tox (only for py27).

Also, I run some (messy) performance benchmark & collected profiles.
You can check my benchmarks at:
https://github.com/ugodiggi/pex/tree/ugo/bench_hack

  • 3
  • 0
  • 0
  • 0
  • 3
Description From Last Updated
Will this be accessed concurrently by multiple threads? If so it should be protected by a mutex. BE benjyw
Personally I find this to be a good place to rely on the GIL. The guarantee that I rely on ... UG ugodiggi
Ditto this. Does it need to be mutex-protected? BE benjyw
PA
  1. Those numbers are impressive! Thanks for picking up the torch on these perf issues!

  2. 
      
BE
  1. 
      
  2. pex/link.py (Diff revision 1)
     
     

    Will this be accessed concurrently by multiple threads? If so it should be protected by a mutex.

    1. To clarify - I believe it's not good practice to rely on the GIL for this.

  3. pex/package.py (Diff revision 1)
     
     

    Ditto this. Does it need to be mutex-protected?

  4. 
      
IT
  1. post Benjy's comments. thanks for working on this!

  2. 
      
UG
UG
  1. 
      
  2. pex/link.py (Diff revision 1)
     
     

    Personally I find this to be a good place to rely on the GIL.

    The guarantee that I rely on is that dict.get(key) and dict[key] = value are each atomic. In my own code I would not think that this is bad form.

  3. 
      
BE
  1. Thanks for fixing. To clarify why this is important:

    The code as written happens to be safe, but that's true only because of subtle implementation details of python bytecode and VMs, which is fragile and opaque. It would be easy for someone to come along in the future and change the code in a way that breaks that atomicity, without even knowing that they had done so.

    A mutex future-proofs the code and also documents its concurrency requirements. And locking an uncontended mutex is very cheap, so there's no harm to performance.

  2. 
      
UG
IT
  1. Ship It!
  2. 
      
JS
  1. Merged by Brian @ https://github.com/pantsbuild/pex/commit/258fc73e684b44655ab93ea53b0a1ea60bf361b3
    Please mark this review as submitted.

  2. 
      
UG
Review request changed

Status: Closed (submitted)

Change Summary:

Thank you for the fast review all!

Loading...