Allow FPStrategy to opt out of fingerprinting

Review Request #1602 — Created Jan. 12, 2015 and submitted

benjyw, jsirois, stuhood, zundel

Allow FingerprintStrategy to opt out of fingerprinting.

NOTE! This change bumps the universal GLOBAL_CACHE_KEY_GEN_VERSION. It therefore invalidates every artifact and target!

In some cases (in particular Ivy), a FingerprintStrategy might want to say that a target it is fingerprinting contributes nothing to the fingerprint. Currently this is impossible since even the presence of an "empty" fingerprint (a hasher that's been intialized, not updated, and then had its hexdigest dumped) still contributes to the transitive fingerprint of other targets. This change modifies the interface of FingerprintStrategy so that None is a valid return value.

Callers of FingerprintStrategy.fingerprint_target are modified accordingly to handle the possibility of None. In particular, a transitive hash as computed in Target.transitive_invalidation_hash considers that if its transitive dependencies all opted out (or are empty) and its own non-transitive hash is an opt-out, then the transitive hash is also an opt-out.

The motivation of this change was to reduce the number of calls to ivy. Generally, unless a target is a JarLibrary or it contains configurations/excludes in its payload, it doesn't contribute to invalidating an ivy resolve. Previously, a structural change to the build graph (adding a dependency that brought in no 3rd party libs) would still trigger an ivy resolve, even though it was unnecessary. Now, only changes that could possibly affect ivy resolution cause the transitive fingerprint to change.

CI is baking:

  • 0
  • 0
  • 8
  • 1
  • 9
Description From Last Updated
  2. src/python/pants/base/ (Diff revision 1)

    You've changed cached hash keys IIUC since the old sort was on self.dependencies and the new sort is on the hashes of those dependencies. Benjy did a dance to avoid such things for example just recently here:

    1. I don't see the dance you're referring to, but yes, I did change the cache keys and the way they're computed. However, most users who use the cache and have expensive-to-recompute targets are building scala or java, which both rely on ivy for third party dep resolution, and all JVM targets will almost certainly be invalidated by this change anyway (by the nature of the change).

      So I'm okay with breaking old cache keys here for the sake of clearer key computation. Maybe we should have a big warning (either in the commit or in the next release notes) noting that the cache keys have been invalidated? I can also bump the global cache key salt to codify this.

      Please reopen the issue if you disagree... or come yell at me in person tomorrow morning!

    2. Oh, double checking the change you linked, do you mean invalidating existing analysis? That can happen without actually invalidating the targets (despite breaking existing cache entries) so it's much more dangerous. This, on the other hand, is no more dangerous than, for example, adding a new payload field. But it does, for any non-trivial target, invalidate the entire world.

    3. Your logic makes sense and this RB summary is enough to figure out the CHANGELOG.rst entry should go under the API Changes heading.

  2. global_vts.targets == targets unless I've re-read invalidation code wrong. If thats right then the new 2-tuple return provides no new info and could be reverted from (classpath-list, relevant-targets) to classpath-list.

    This highlights that a test exercising the differences this change is supposed to make would be nice to have.

    1. I would definitely bump the global cache version for this.

    2. After an offline discussion, we agreed that the tuple is necessary (ivy_resolve needs to generate reports or short circuit based on the targets that were opted out of caching). I will also bump the global cache key.

    3. Clarifying the offline a bit more, my assertion that global_vts.targets == targets was false which in itself invalidated my concern.

  1. Ship It!
  1. My concern w/ this is that it changes the meaning of InvalidationCheck.all_vts st instead of being all the targets with their fingerprint keys, it now means all the targets that have non-none fingerprints.

    I think this'll break depmap, and provides, because they assume that the location of the ivy report is based on self.context.target_roots, and not the filtered targets. Maybe they could be changed to depend on a product based on the report?

    1. Bingo - Patrick & I were sitting together debugging this very issue in Depmap.

    2. I fixed the depmap example, and it looks like an equivalent fix will work for provides as well. Thanks for the keen eye! Updating with the fixes presently.

    3. Thanks. It makes me nervous that provides doesn't have integration tests.

  2. Would it make sense to move this above bootstrapping ivy if the goal is to reduce overhead?

  3. Should this check for None, or is empty equivalent

    1. It shouldn't ever be Falsey, but I agree that checking for None is more precisely what I mean in the interface. Fixed.

  4. src/python/pants/base/ (Diff revision 2)

    This docstring is no longer correct because the returned value doesn't cover the input targets anymore.

  5. src/python/pants/base/ (Diff revision 2)

    this should change too. Maybe

    Wrap targets and their computed cache keys in VersionedTargets, excluding targets that have no key.

  6. src/python/pants/base/ (Diff revision 2)

    should this check for None explicitly, or can keys be valid and empty?

  7. src/python/pants/base/ (Diff revision 2)

    update w/ note about what None means here?

  2. An assert on len 1 might be nice to have here until cleanups kill the need for the comment/gymnastics.

  3. The LHS looks like it must have been broken already. The xml report looked up is for just the the target_roots and no-one resolves for just the target_roots - they resolve for context.targets, so there is no report. It may make sense to just delete this task outright - I'm pretty sure its unused / broken.

    1. I've confirmed this with:

      $ provides examples/src/java/com/pants/examples/hello/greet
      15:49:34 00:01   [provides]
      15:49:34 00:01     [provides]ivyinfo: <pants.backend.jvm.ivy_utils.IvyInfo object at 0x7fe54c16e0d0>
                         Wrote provides information to /home/jsirois/dev/3rdparty/pants/.pants.d/provides/provides/hello-greet.default.provides
      jsirois@gill ~/dev/3rdparty/pants (master *) $ cat /home/jsirois/dev/3rdparty/pants/.pants.d/provides/provides/hello-greet.default.provides
      # from jar /home/jsirois/dev/3rdparty/pants/.pants.d/jar/jar/com.pants.examples-hello-greet.jar
      jsirois@gill ~/dev/3rdparty/pants (master *) $ provides examples/src/java/com/pants/examples/hello/main/
      15:49:43 00:00   [provides]
      15:49:43 00:00     [provides]ivyinfo: None
                         Wrote provides information to /home/jsirois/dev/3rdparty/pants/.pants.d/provides/provides/106a2f4e8804087fda3506d0f27fde37faa6dda2.default.provides
      jsirois@gill ~/dev/3rdparty/pants (master *) $ cat /home/jsirois/dev/3rdparty/pants/.pants.d/provides/provides/106a2f4e8804087fda3506d0f27fde37faa6dda2.default.provides

      IE: provides only works when CLI targets are in fact the full closure - which is ~never.

    2. Added a note to the class that we should delete it. I'll follow up on pants-devel and with another CR.

  1. lgtm aside from a couple more minor nitpicks

  2. src/python/pants/base/ (Diff revision 4)

    if dep_hash is not None to match other hash checks.

  3. src/python/pants/base/ (Diff revision 4)

    maybe keep the result of sorted(dep_hash_iter()) in a var rather than re-traversing the transitive hashes twice?

  1. Ship It!
Review request changed

Status: Closed (submitted)

Change Summary:

Thanks for the detailed review, submitted @ a2a2fb58a85df802b4a8203e0b44ee4c3fc80161