Write artifacts to the cache when vt.update() is called.

Review Request #3722 — Created April 19, 2016 and submitted

patricklaw
pants
3221
pants-reviews
benjyw, jsirois, mateor, stuhood, zundel

Very large zinc compiles can have thousands of targets, but the artifacts
are only written to the cache after all of them have been compiled.

Since isolated compilation guarantees us that any given target which
succeeded at compiling and which otherwise should have been cached is
safe to cache even if its downstream dependencies fail, we can instead
write these artifacts as targets finish compiling.

This has two large benefits:
1. It spreads out write-load to the artifact cache.
2. It significantly improves cache hit rate, especially with zinc's double-check, in some common CI scenarios.

This change also removes the unused invalidated() parameter use_cache.

CI is green (Jenkins)

ST
  1. 
      
  2. Can you add a comment to the block above indicating why its disabled?

  3. I'm worried that this will go out of sync. Can all of these checks be moved into _should_cache? I think it was broken out before for debugging purposes, but we haven't needed to debug at this level in a while.

  4. src/python/pants/task/task.py (Diff revision 1)
     
     

    Perhaps off topic, but I believe that use_cache is unused now... could probably remove. Only ivy was using it.

  5. 
      
PA
MA
  1. LGTM - I had to do a song and dance to manage which artifacts were cached in a plugin task - the write_artifacts flag would have been helpful then.

  2. 
      
PA
ZU
  1. Patrick you rock! I think combined with the 'add randomness to the build order' we could speed up some of our sharded tests more, and those teams have been looking for performance improvements.

  2. 
      
PA
ST
  1. I think that this code has decent integration test coverage via the various ensure_cached annotations, but would be good if you could double check.

  2. src/python/pants/invalidation/cache_manager.py (Diff revision 4)
     
     
     
     
     
     
     
     

    Should this run conditionally only if the artifact is not already marked valid?

  3. src/python/pants/task/task.py (Diff revision 4)
     
     
     

    Would be good to put this inside of update if possible.

  4. src/python/pants/task/task.py (Diff revision 4)
     
     

    Maybe rename to _should_cache_target_dir? I didn't realize initially that this is specific to the cache_target_dir setting.

    Also, maybe move the self.cache_target_dirs bit to the front of the check.

  5. 
      
ZU
  1. 
      
  2. src/python/pants/task/task.py (Diff revision 4)
     
     

    I don't get it - this was already the list of invalid vt's, why do you want to check it again?

    1. oh wait, now I get it. The caller might have already called vt.update() you don't want to duplicate effort.

    2. Yup, though now given Stu's suggestions we're just going to make it idempotent anyway.

  3. 
      
PA
PA
PA
ZU
  1. Ship It!
  2. 
      
PA
PA
BE
  1. Ship It!
  2. 
      
BE
  1. 
      
  2. This is yet more evidence that the invalidation cache and the artifact cache should be the same thing...

  3. 
      
PA
PA
PA
PA
Review request changed

Status: Closed (submitted)

Change Summary:

Upstream at c4787570233a87ea584771147f178f33dcf080bf

Loading...