I think the 2 issues I marked are real, the build_invalidator issue most so. That issue crosses layers right now so it would require a bit of test setup to test, but definitely doable.
There is still a
.pants.d/build_invalidator/subdir for each task that does not vary with this new identity. That should lead to tasks that use invalidation checks missing the need to invalidate, skipping past the invalidation guard, and then failing to populate the product map or else populating it with products for the wrong identity. Either way the failure is uncontrolled.
We should be able to refactor a Task class name and not invalidate its local work when upgrading a pants install. How about just add a dummy constant field like
'seed', PrimitiveField('')- anything that is localized here and fully stable (comments could be added to warn off modification for this reason).
Highly subjective, but this feels like a downgrade in readability. With if/else I can just have my brain in that block, with continue, things are non-local and I have to go look out past the with to find the for and double-check this is sane.
Allow Tasks to include a Payload that determines their identity
Review Request #2009 — Created March 31, 2015 and updated
|areitz, davidt, jsirois, patricklaw|
- Add a Task lifecycle method that returns a Payload representing the unique identity of a Task
Since there are only two FingerprintStrategy subclasses, and they're both used to mix in Task options (ivy confs for IvyTaskMixin and library versions for JvmCompile), I think that it should be possible to replace/deprecate FingerprintStrategy with a mixing-in of Task identity.
If people think that sounds reasonable, I'll work on it on this review first thing tomorrow.
|These workdir names are going to make it difficult to poke around in .pants.d for debugging purposes. I like the ...||BE benjyw|
As John said, this is incomplete in the sense that it doesn't handle invalidation, and the scope of this change will necessarily be much larger if it's to work.
Additionally, it is not possible to deprecate
FingerprintStrategyin favor of this. While in one or two places
FingerprintStrategiesare used simply to mix in static configuration from a task, their original (and more complex) purpose was to selectively fingerprint targets based on arbitrary logic. See
ivy_task_mixin.pyfor an example of this.
That being said, if this lifecycle method is complete, it would definitely make sense to mix this information into the
FingerprintStrategyused for a class--which is to say, classes would always retrieve their
FingerprintStrategyinstances by calling
self.get_fingerprint_strategy(...). This would greatly simplify the first use case above where a
FingerprintStrategyis just used to mix in some static information or configuration about the class, which not stepping on the toes of classes that need more sophisticated, per-target strategies. And this nice thing about this is is that I think it also takes care of the problem John mentioned: if your
FingerprintStrategyrolls in this configuration into its hashes, then it naturally gets plumbed all the way through to the cache manager and build invalidation--indeed, that's the entire purpose of
FingerprintStrategyin terms of plumbing.
So, I'm still a bit skeptical of this change--over-invalidating is detrimental to productivity, and this change is going to be hard to test. But once the major issues are fixed I'll be happy to take a look and maybe run some tests internally before this goes upstream. One quick suggestion I'd make is to allow
Noneas a default response to "what's my task level invalidation", instead of the hash of an empty string payload. This special case (which also exists for targets, so there's symmetry), allows the task to opt out of this behavior, and would probably be a good default, especially at first.
Before we go down this path, I'd like to step back a tiny bit and think about bringing some regularization/normality to the roughly overlapping concepts of "artifacts", "products" and "stuff in .pants.d". I'm hoping a design will fall out than unifies the common aspects of these around fingerprinting and invalidation.
These workdir names are going to make it difficult to poke around in .pants.d for debugging purposes. I like the idea of defaulting to a non-qualified name if the "identity payload" is trivial.
Another fun anecdote to motivate a change like this:
Recently we bumped from scala 2.10.4 to 2.10.5... this correctly caused the artifact cache to miss, and everything seemed to have gone swimmingly. But because there was 2.10.4 analysis still sitting on disk, the next cache hits using the global strategy and pulling down 2.10.5 artifacts would merge it into the global analysis and get... 2.10.4 analysis (first header wins.) Result: zinc ignores the analysis and rewrites 2.10.5 analysis, which then gets merged back into the global analysis to make it invalid again.
Long story short, it took a week to notice that incremental compiles were disabled for anyone who hadn't run
clean-all. Sure, we should fix the analysis merging. But there are no end of potential bugs that could be prevented by ensuring that tasks with different inputs are isolated from themselves.