Cache `dep-usage.jvm` results and provide ability to use cached results in analysis summary

Review Request #3612 — Created March 25, 2016 and submitted

tabishev
pants
tabishev/depjvm_caching
3093
pants-reviews
benjyw, patricklaw, stuhood, zundel

Current implementation of dep-usage.jvm working slow for big target's sets mainly because it require compilation results of all targets in analysis, which led to a lot of compilation or to retrieve a lot of data from build cache. Alternatively it's possible to cache dep-usage.jvm results for each target and use them to build analysis summary.

This RB adding build cache support for dep-usage.jvm. However using cached results is disabled by default and supported via --use-cached flag.

There are couple of reasons for this behavior:
- It's impossible to fallback to direct version in case of values absence in the cache. To decrease execution time we want not to require runtime_classpath product for analysis, and therefore it's impossible to ask for this data while task execution.
- Cached results can differ from direct results because we are ignoring 3rd party libraries resolved versions in a cache key. This information most likely not affect the analysis summary, but in some rear cases it can.

https://travis-ci.org/ttim/pants/builds/119111743

  1. Looks good! As discussed, a test or two would be much appreciated.

  2. It would be good to say explicitly that the point of this is to avoid needing to fetch compile artifacts or compile things.

  3. src/python/pants/backend/jvm/tasks/jvm_dependency_usage.py (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     

    Can you break the two branches of this conditional into methods? Or just move the node_creator functions out. It's a pretty fundamental switch.

    1. Done, can't say I'm happy with the result, but looks better, what do you think?

  4. Unnecessary to mkdir the results_dir... it will already exist.

  5. Is it actually necessary to resolve targets, or could you just switch all of this to Address objects instead? I think the only time when we actually use the target object is to get the sources count, but you could cache that prolly. Just a thought... would rather tests than too much refactoring.

    1. Actually I had a version where everything in DependencyGraph was Address (str if to be precise, but Address is better). I prefer current version for simplicity and customizability in terms of you need only one cache for different size_estimator functions. Also this cache looks nice from the perspective of derived value of *classpath products.

  6. 
      
  1. Thanks.

  2. src/python/pants/backend/jvm/tasks/jvm_dependency_usage.py (Diff revisions 1 - 4)
     
     
     
     
     

    Run-on sentence.

  3. pydocs on the strategies used by each of these methods would be good.

  4. 
      
  1. Ship It!
  2. 
      
Review request changed

Status: Closed (submitted)

Change Summary:

Merged as 3598a544100e69a3e8077f2e567c5c18dfd02d22

Loading...