Include resolved jar versions in compile fingerprints; ensure coordinates match artifacts.

Review Request #2853 - Created Sept. 21, 2015 and submitted

Information
Nick Howard (Twitter)
pants
2228
ca09008...
Reviewers
pants-reviews
benjyw, jsirois, stuhood, zundel

We've run into issues recently where a different jar version resolution can result in different compiled outputs. This adds the resolved coordinates into the fingerprint for JVM compile artifacts. By doing that we can ensure that cache entries for compile artifacts will be invalidated if the resolved jar versions change.

Additionally, while working on this I discovered that the revision assigned to a particular classpath artifact is the requested one and not the final one. I added a unversioned -> versioned table to IvyInfo to ensure that the resolved versions are used in all cases.

If removing global compile lands before this does, I'll clean up the strategy code in the tests.

Wrote failing tests and made them pass.

Issues

  • 1
  • 1
  • 1
  • 3
Description From Last Updated
str should never appear in pants code. It means different things on different versions of python. Instead, use some combination ... Patrick Lawson Patrick Lawson
Eric Ayers
Stu Hood
Eric Ayers
John Sirois
Stu Hood
Nick Howard (Twitter)
Nick Howard (Twitter)
Review request changed

Status: Closed (submitted)

Change Summary:

https://github.com/pantsbuild/pants/commit/31fba06d653d691649855e6e620137e741ccf933

Patrick Lawson

   

What guarantee of consistent ordering do you have here? As far as I can tell, compile_classpath does not attempt to impose a canonical order on its entries.

We're seeing major issues with artifact cache hit rates internally, but not on the CIs themselves. This leads me to suspect platform/order dependent cache key generation, and this change stuck out to me.

  1. For a given target the ordering of the class path entries should always the same provided the target's closure hasn't changed. This is because the class path is generated by walking the graph and not by looking things up in a dict.

    However the resolved version may and indeed does change depending on the results of how ivy adjusts versions for a given resolve. This is an issue because we batch resolves. If we resolved each target separately, we wouldn't have this issue but at the cost of performance.

    One possibility would be that this is causing cache misses because you fill the cache with one batched resolve, while locally, the resolves may differ. If you aren't using ivy and have fixed all jar versions, that shouldn't be the issue.

  2. We talked about this in slack: in this case, Foursquare's resolver needs to populate ClasspathProducts in a consistent order, which CP will then preserve. The hashing does need to be ordering sensitive.

  3. Yup, I didn't realize that populating the classpath products in a stable order was part of the contract. It was easy enough to fix on our end--thanks for the clarification.

str should never appear in pants code. It means different things on different versions of python. Instead, use some combination of six and string encoding to generate bytes.

  1. Interesting. I was unaware of that. I thought str was just a delegating fn that called __str__ on the passed object if it was defined.

  2. I'm with Nick on this, I'd be interested to know if there is something I'm missing.
    2.7: https://docs.python.org/2/library/functions.html#str
    3.5: https://docs.python.org/3.5/library/stdtypes.html#str
    
    Calling str(obj) has the same behavior in 2or3.  Its only in 3 and calling with the extra args do things get divergent.
    
    The meat of the relevant doc for 3.5:
    
    If neither encoding nor errors is given, str(object) returns object.__str__(), which is the “informal” or nicely printable string representation of object. For string objects, this is the string itself. If object does not have a __str__() method, then str() falls back to returning repr(object).
  3. Yeah, I did go read up on this afterwards and I'm still a bit torn. What happens if that object doesn't define __str__ (I know in this case it does)? What happens if the __str__ returns a text/unicode type? Hasher wants bytes, and if there isn't an explicit encoding then on python2 it's going to attempt to encode it with ascii, which will fail in the presence of some non-ascii. It's one of those situations where I think we should be really careful about our string handling, because python will do the right thing most of the time--until it doesn't, which makes it all the more dangerous.

Loading...