Avoid ivy cache thrash due to ivydata updates.

Review Request #2333 — Created June 8, 2015 and submitted

jsirois
pants
jsirois/travis-ci/avoid_ivydata_cache_thrash
1650
2344
6e9a0ba...
pants-reviews
davidt, dturner-tw, zundel

The ivydata-*.properties and root-level *.{properties,xml} files'
effect on resolution time is in the noise, but they are re-timestamped
in internal comments and fields by ivy on each run and this leads to
cache thrash in travis-ci. Kill these files before the cache check to
avoid un-needed cache re-packing and re-upload which is a ~100s
operation that occurs at the end of the build on 9 of our 10 ci
shards.

.travis.yml | 8 ++++++++
1 file changed, 8 insertions(+)

Ran travis lint, which fails with:

$ travis lint
Warnings for .travis.yml:
[x] unexpected key before_cache, dropping

However, this is a newish documented feature:
https://github.com/travis-ci/docs-travis-ci-com/commit/088be0f8de1c45135bcfa6d3b48ee0a561448716

As a result, giving this a whirl since the savings should be large if
the feature does actually exist.

CI went green here:
https://travis-ci.org/pantsbuild/pants/builds/65978345

JS
JS
  1. With the ivydata properties out of the way, the 1st CI run revealed more cache thrash, this time from resolved-*.xml files, like so:

    change detected: /home/travis/.ivy2/pants/resolved-org.antlr-antlr-caller-working.xml
    changes detected, packing new archive
    

    Diff 2 kills those since they also proved to have an effect on resolve times in the noise in local experiments.

    1. And the 2nd CI run shows that diff2 does not work for the find on travis.
      3rd diff coming with explicit condition grouping...
  2. 
      
ZU
  1. 
      
  2. .travis.yml (Diff revision 2)
     
     

    As far as I know that the resolved-*.xml files are needed to populate the ivy_resolve_products product [1]. ivy_resolve()[2] can bypass running ivy all together. I'm thinking that if we remove the cache completely and then re-run travis, we could be missing a needed file on the next run.

    If this is really a problem, an alternative would be to set the timestamps on these files to a fixed value instead of removing them.

    [1] https://github.com/pantsbuild/pants/blob/3f0ee0371d5117ddcd899e4272f5cf8eaa8492f4/src/python/pants/backend/jvm/tasks/ivy_resolve.py#L130
    [2] https://github.com/pantsbuild/pants/blob/c84c5cc4d7ef3b440c2ff448ce531595c75a3644/src/python/pants/backend/jvm/tasks/ivy_task_mixin.py#L112

    1. Typo above, the product name is ivy_jar_products

    2. The timestamps are internal to the files, and in different forms (one a comment, another a tag attribute), so setting the timestamps is not as trivial as it would otherwise be.
      I think we're good on blow-away pre-ci since ci starts with a clean-all forcing 1 re-resolve of any dep needed.  The re-resolves create the ivydata and resolve-*.xml files by inspection after removing them.
  3. 
      
JS
ZU
  1. Ship It!
  2. 
      
JS
  1. ... and now its revealed in that latest CI run that other root-level .xml and .properties change via timestamp and yet again have ~0 effect on resolve speed.

    change detected: /home/travis/.ivy2/pants/resolved-internal-testprojects.src.protobuf.org.pantsbuild.testproject.proto-ordering.a.a-latest.integration.properties
    changes detected, packing new archive
    

    One last round of clean diff coming...

  2. 
      
JS
JS
JS
  1. Thanks Eric - submitted @ https://github.com/pantsbuild/pants/commit/49a04374a5225255f5fc8aebf453617a832e64e3
  2. 
      
JS
Review request changed

Status: Closed (submitted)

JS
  1. Figured out the dangling com.example cache-thrash problem here: https://rbcommons.com/s/twitter/r/2344/
  2. 
      
Loading...