Bring back the extra_data in the artifact hash

Review Request #808 — Created July 31, 2014 and submitted

johanoskarsson
pants
pants-reviews
benjyw, ity, jsirois, patricklaw, stuhood
The extra_data contains java or scala versions, needed to avoid build cache artifacts mixing the two. The payload refactor moved the hash generation into the payload. In this branch I hook up the extra data to the hash generation again.
added additional unit tests, ci.sh passes
  • 2
  • 0
  • 6
  • 0
  • 8
Description From Last Updated
I'm a bit confused here. Currently Task.create_cache_manager already uses Task.invalidate_for() as extra data passed into the InvalidationCacheManager. Is there some ... PA patricklaw
Is it appropriate to sort the extra data here? There's nothing about this class which makes it explicit that the ... PA patricklaw
JS
  1. 
      
  2. src/python/pants/base/cache_manager.py (Diff revision 1)
     
     
    It appears you're threading through self._extra_data here instead of accessing the field directly in _key_for for testing reasons.  We don't have @visible_for_testing which would make this more clear, but when we go to refactor this the test will break and that's good enough.
    1. Fixed to use the field instead of passing around for tests.
  3. 
      
PA
  1. 
      
  2. List literals as default values are extremely dangerous.  Please set to None and then default below with `extra_data = extra_data or []`
  3. Is order going to matter on this data?  Might want to sort it.
  4. 
      
PA
  1. In general, I think this should be the responsibility of FingerprintStrategy.  See fingerprint_strategy.py and ivy_task_mixin.py for an example.
    1. Yeah, that sounds like it would be neater. I couldn't quite wrap my head around how to bring that together, would it be ok if I do it in a follow up once the regression has been fixed?
  2. 
      
JO
JO
ZU
  1. Drive by comment: we just had some success on the setup of our distributed cache today and were discussing this very subject. thanks for making it happen!
  2. 
      
JS
  1. 
      
  2. s/ = /=/
  3. 
      
IT
  1. lgtm!
  2. 
      
PA
  1. 
      
  2. I'm a bit confused here.  Currently Task.create_cache_manager already uses Task.invalidate_for() as extra data passed into the InvalidationCacheManager.  Is there some reason that this isn't already doing the job for us?
    1. While it is passed through in cache_manager.py the _extra_data field is never used. It was used before the payload refactor (I'm assuming that's where it got cut loose).
    2. You're right, _extra_data is dead code, meaning that invalidate_for (which is used in several places) is also dead code.  This should be fixed, presumably in a followup.
    3. Good point, looks like this can be cleaned up a bit. Will do in a new branch as suggested.
  3. I prefer the old way of returning an explicit list (in fact a tuple is probably better).  Minor nit, it just seems easier to read.
    1. I tried to standardise on list (without entries that are themselves lists or None) for all the methods involved. Seems like having one way was appropriate, plus it doesn't freak out the hash_all method.
  4. Is it appropriate to sort the extra data here?  There's nothing about this class which makes it explicit that the order of the data doesn't matter.  It seems like this should either be an explicitly documented behavior, or it should be the responsibility of users of this class to first sort their extra data.
    1. Seems like it'll be harder for users to shoot themselves in the foot if the sorting is here. I don't have strong feelings about it though, I can move it out to the user side if you want?
    2. I do feel strongly that this is bad.  Not that it's bad for the strategy to sort it, but for it to sort it silently.  I could easily see being a user and expecting the order of my fingerprint salt to matter.
      
      Furthermore, the more I think about this, the more I think this class shouldn't exist, at least not in this commit.  I'd rather see a FingerprintStrategy subclass that lives in jvm_compile.py and explicitly takes these versions as init parameters then rolls them into the hash.
      
      Note by the way that you should also be doing some sort of type checking to make sure that this isn't applying this strategy to Targets for which it is inappropriate (e.g. JarLibrary).
    3. Moved the fingerprinter, I put it as a new file in the jvm_compile dir since the jvm_compile.py file is getting quite big. Added a comment about sorting to the constructor. 
      
      For now I left it to consume the invalidate_for() output, I'll rename those to something more specific in a follow up when I clean up the extra_data as discussed.
      
      I'm still missing a piece on the JarLibrary part though. What's the harm in using the java/scala version when fingerprinting JarLibrary? Doesn't it just add extra 'precision'?
  5. 
      
JO
PA
  1. lgtm again minus the issues raised.
  2. Since this isn't used anymore, can you throw it away?
  3. Ditto with throwing away, since this is almost exactly duplicated in JvmFingerprintStrategyTest
  4. I still think a type check for exactly which targets are relevant for this fingerprint is appropriate here.  In particular I think it would be fine to roll in the version info on any `isinstance(JvmTarget)`, and just delegate to super for anything else.
    
    Things almost certainly won't break if you don't do this, but I think it's nice to be precise about what problem this fingerprinting solves.  If you're very confident that non-JvmTarget instances should ever make it to this fingerprinter, you could just raise an exception on a bad type instead.
    
    Now that we've seen another subclass of FingerprintStrategy, it seems to be hinting to us that we should just have a default implementation of `compute_fingerprint` that delegates to an implementation on the class which returns a map from TargetType to a method that fingerprints an instance of that TargetType.  But that's idle talking suitable for a different CR.
  5. 
      
JO
PA
  1. lgtm minus the one nit.  Happy to see it shipped after that's resolved
  2. You compute this no matter what, might as well pull it out and not compute it twice.
  3. 
      
JO
Review request changed

Status: Closed (submitted)

Loading...