Allow Tasks to include a Payload that determines their identity

Review Request #2009 — Created March 31, 2015 and updated

stuhood
pants
1351
d492b90...
pants-reviews
areitz, davidt, jsirois, patricklaw
  • Add a Task lifecycle method that returns a Payload representing the unique identity of a Task

Since there are only two FingerprintStrategy subclasses, and they're both used to mix in Task options (ivy confs for IvyTaskMixin and library versions for JvmCompile), I think that it should be possible to replace/deprecate FingerprintStrategy with a mixing-in of Task identity.

If people think that sounds reasonable, I'll work on it on this review first thing tomorrow.

https://github.com/pantsbuild/pants/pull/1347

  • 1
  • 0
  • 2
  • 0
  • 3
Description From Last Updated
These workdir names are going to make it difficult to poke around in .pants.d for debugging purposes. I like the ... BE benjyw
JS
  1. I think the 2 issues I marked are real, the build_invalidator issue most so.  That issue crosses layers right now so it would require a bit of test setup to test, but definitely doable.
  2. There is still a .pants.d/build_invalidator/ subdir for each task that does not vary with this new identity. That should lead to tasks that use invalidation checks missing the need to invalidate, skipping past the invalidation guard, and then failing to populate the product map or else populating it with products for the wrong identity. Either way the failure is uncontrolled.

    1. Am I correct in assuming that by mixing the Task identity into Target hashes, we'll get that for free?

    2. Yes - but then you thrash the "cache" un-necessarily since you still have 1 invalidator dictionary per Task, but N workspaces to hold products per Task.
      So either mix it in to Target hashes and revert the workspace path differentiation, allowing for only 1 invalidated answer per Task, or keep your path differentiation and add parallel invalidation differentiation.
    3. Oooh, I see now. I thought there was a global BuildInvalidator directory, but it is per-Task. Fixing.

  3. We should be able to refactor a Task class name and not invalidate its local work when upgrading a pants install. How about just add a dummy constant field like 'seed', PrimitiveField('') - anything that is localized here and fully stable (comments could be added to warn off modification for this reason).

    1. Agreed... added a seed.

  4. I'm missing how your change fixes this TODO.
    1. It doesn't, but the comment was stale: the options system has landed, and you can specify ./pants repl.scala or ./pants repl.py to select targets.

    2. I don't understand the TODO then since:

      $ pants.dev repl.scala src/python/pants/base:address examples/src/java/::
      ...
      12:15:52 00:00   [repl]
      12:15:52 00:00     [py]
      FAILURE: Mutually incompatible targets specified: PythonLibrary(BuildFileAddress(/home/jsirois/dev/3rdparty/pants/src/python/pants/base/BUILD, address)) vs JvmBinary(BuildFileAddress(/home/jsirois/dev/3rdparty/pants/examples/src/java/com/pants/examples/antlr3/BUILD, antlr3)) (and 57 others)
      
      
                     FAILURE
      
  5. src/python/pants/engine/round_engine.py (Diff revision 1)
     
     
    Highly subjective, but this feels like a downgrade in readability.  With if/else I can just have my brain in that block, with continue, things are non-local and I have to go look out past the with to find the for and double-check this is sane.
  6. 
      
PA
  1. As John said, this is incomplete in the sense that it doesn't handle invalidation, and the scope of this change will necessarily be much larger if it's to work.

    Additionally, it is not possible to deprecate FingerprintStrategy in favor of this. While in one or two places FingerprintStrategies are used simply to mix in static configuration from a task, their original (and more complex) purpose was to selectively fingerprint targets based on arbitrary logic. See IvyResolveFingerprintStrategy in ivy_task_mixin.py for an example of this.

    That being said, if this lifecycle method is complete, it would definitely make sense to mix this information into the FingerprintStrategy used for a class--which is to say, classes would always retrieve their FingerprintStrategy instances by calling self.get_fingerprint_strategy(...). This would greatly simplify the first use case above where a FingerprintStrategy is just used to mix in some static information or configuration about the class, which not stepping on the toes of classes that need more sophisticated, per-target strategies. And this nice thing about this is is that I think it also takes care of the problem John mentioned: if your FingerprintStrategy rolls in this configuration into its hashes, then it naturally gets plumbed all the way through to the cache manager and build invalidation--indeed, that's the entire purpose of FingerprintStrategy in terms of plumbing.

    So, I'm still a bit skeptical of this change--over-invalidating is detrimental to productivity, and this change is going to be hard to test. But once the major issues are fixed I'll be happy to take a look and maybe run some tests internally before this goes upstream. One quick suggestion I'd make is to allow None as a default response to "what's my task level invalidation", instead of the hash of an empty string payload. This special case (which also exists for targets, so there's symmetry), allows the task to opt out of this behavior, and would probably be a good default, especially at first.

    1. Sorry about that third paragraph, this is what I get for typing before coffee. That garbled mess in the middle should read:
      """
      ...is just used to mix in some static information or configuration about the class, while not stepping on the toes of classes that need more sophisticated, per-target strategies. And the nice thing about this is that I think it also takes care of the problem John mentioned...
      """

  2. 
      
ST
Review request changed

Bugs:

+1351
BE
  1. Before we go down this path, I'd like to step back a tiny bit and think about bringing some regularization/normality to the roughly overlapping concepts of "artifacts", "products" and "stuff in .pants.d". I'm hoping a design will fall out than unifies the common aspects of these around fingerprinting and invalidation.

  2. src/python/pants/engine/round_engine.py (Diff revision 1)
     
     

    These workdir names are going to make it difficult to poke around in .pants.d for debugging purposes. I like the idea of defaulting to a non-qualified name if the "identity payload" is trivial.

  3. 
      
ST
  1. Another fun anecdote to motivate a change like this:

    Recently we bumped from scala 2.10.4 to 2.10.5... this correctly caused the artifact cache to miss, and everything seemed to have gone swimmingly. But because there was 2.10.4 analysis still sitting on disk, the next cache hits using the global strategy and pulling down 2.10.5 artifacts would merge it into the global analysis and get... 2.10.4 analysis (first header wins.) Result: zinc ignores the analysis and rewrites 2.10.5 analysis, which then gets merged back into the global analysis to make it invalid again.

    Long story short, it took a week to notice that incremental compiles were disabled for anyone who hadn't run clean-all. Sure, we should fix the analysis merging. But there are no end of potential bugs that could be prevented by ensuring that tasks with different inputs are isolated from themselves.

    1. Another one: on https://rbcommons.com/s/twitter/r/2404/ we begin to build a per-task log. But since there isn't a non-destructive/standardized way to "swap" the configuration of a task, if you enable the flag while you have any existing state in pants.d, it's possible to get incomplete/partial logs. If the task-level configuration was included in a task identity, toggling the flag would give you a fresh location on disk, and preserve the invariants.

      As mentioned there, I'll probably ask Cody to work on this.

  2. 
      
Loading...