Split Ivy Resolve into Resolve / Fetch steps

Review Request #3555 — Created March 10, 2016 and submitted

nhoward_tw
pants
3030
pants-reviews
benjyw, gmalmquist, jsirois, patricklaw, stuhood

This patch contains the first step in the changes to resolution that I've been working on. It breaks up the internals of the IvyTaskMixin into two steps so that we can cache the coordinates of transitive external dependencies and fetch them intransitively during future resolves.

How it works:
When Pants does a ivy resolve, it now generates a json file containing the coordinates of the resolved artifacts and their relationships to the targets that participated in the resolve. This file is cached. If a subsequent resolve successfully hits the cache and gets the json file, instead of doing a full resolve, it instead does a simpler, flat, intransitive resolve using the coordinate information from the full resolve.

  • Resolved versions can't move based on changed dependency information in remote metadata.
  • Flat, intransitive resolves are faster because Ivy no longer has to do version conflict resolution.

In IvyTaskMixin, now instead of having a single fast path, slow path, we have four paths that fall through to the next if their checks fail.

  • Try loading an existing fetch result.
  • Try loading an existing resolve result.
  • Try fetch and load the result.
  • Do a resolve and load the result.

Major Components:

In ivy_task_mixin:

  • Return IvyResolveResult from IvyTaskMixin.resolve

In ivy_resolve:
- Use the result from IvyTaskMixin.resolve to find the report files to generate reports from.
- ResolutionStep, IvyResolutionStep, IvyFetchStep and IvyResolveStep: Encapsulate the load and resolve paths of the two ways of resolving external artifacts using ivy.
- FrozenResolution: Represents the resolved coordinates of all transitive external artifacts and their relationships to targets
- IvyFetchResolveResult: subclass of IvyResolveResult that overrides how target to jar relationships are determined to use FrozenResolutions.

Minor adjustments:

  • Make JarDependencyManager injectable for ivy.xml generation to make testing easier.
  • Move remaining static ivy lock from IvyTaskMixin to IvyUtils
  • Deprecate IvyUtils methods that do things that are specialized by IvyTaskMixin in favor of methods that have larger granularity.

Follow on work:

  • Represent the external artifact dependency graph in the cachable resolution information. https://github.com/pantsbuild/pants/issues/3052
  • Use Pants' buildcache to store the external artifacts that are resolved in a full resolve, falling back to a fetch resolve if the cache is unavailable. https://github.com/pantsbuild/pants/issues/3053
  • Implement the fetch resolve in Python rather than using Ivy. https://github.com/pantsbuild/pants/issues/3054
  • Isolate resolutions for leaf targets. https://github.com/pantsbuild/pants/issues/3055

Design Doc for 3rdparty changes https://docs.google.com/document/d/1RE4Kuf9f2g-ZU2nHKozRZ3P6oUwAEYaREBXLcn83C9w/edit

I added a number of unit tests around ivy resolve behavior around fetch vs full resolves. I've also done some testing within our repo.

CI passed at https://travis-ci.org/pantsbuild/pants/builds/116752107

  • 0
  • 0
  • 7
  • 3
  • 10
Description From Last Updated
  1. Awesome work, just a few nits.

  2. nit: s/a the/all the/ ?

  3. nit: put the '.' before the format on the second line

  4. nit: indentation

  5. nit: weird formatting?

  6. It would be great to have a way to get at that graph with external tooling

    1. Yeah. Adding it feels like it'll add further complexity which is why I punted for this review. It is already too big.

  7. accidental whitespace?

  8. 
      
  1. Thanks for the detailed description! It's really helpful when reviewing.

    Before I dive too deeply into the code though, I'm noticing that the terminology can get in the way of readability.

    Intuitively, "resolve" means "choose versions" and "fetch" means fetch necessary jar files. It seems that in ivy-land "resolve" means both those things, so you've gone with "Full resolution" vs. "Fetch resolution"? "Full" is a vague word, and both are similar-length words starting with an 'F', which is making my head spin when reading the code.

    Also, the fact that fetching jars from the internet is also called "resolution" is a detail that will go away once we reimplement it in Python. It'll make no sense at all to call that thing "resolution" at that point.

    Could we choose two emphatic and distinctive terms that are untangled with Ivy terms? Say "Versioning/VersionSelection" vs. "Fetch" or something like that?

    1. How would you feel if I renamed them as "resolve" and "fetch" instead of "full resolve" and "fetch resolve"? We'd make the Pants concepts "resolve" and "fetch" and abstract away ivy's ideas about what "resolve" means. Resolve is such a good work for what "full" does that I'd prefer to approach it that way.

      So, the process would be resolution with steps resolve and fetch. I'd rename things along the following lines:

      IvyResolveStrategy -> ResolutionStep
      resolve_and_load -> execute_and_load

      FetchIvyResolveStrategy -> IvyFetchStep
      FullIvyResolveStrategy -> IvyResolveStep

      IvyFetchResolveResult -> IvyFetchResult
      IvyResolveResult -> same

      How's that sound?

    2. That sounds good to me, I agree that "resolve" is the right word for what "full" does (assuming that "resolve" doesn't also include a "fetch"?)

      The names you propose sounds good. I like "IvyFetchStep" because then it can easily be replaced with "FooFetchStep" when re-implemented.

      Thanks!

  2. 
      
  1. 
      
  2. a the inputs

  3. src/python/pants/backend/jvm/ivy_utils.py (Diff revision 2)
     
     
     
     

    The cache dir and global ivy workdir probably don't affect the identity of the resolve, do they? Or, we don't want them to at least.

    1. Hmm. That's a good point. There's a fair bit of information that's derived from them, used in a number of places, all the same ones the request objects end up passed to.

      I could adjust the docstring for this, or introduce a new class.

    2. Well, as the comment says, those other properties are context specific, and can be passed in by the caller to do_resolve?

  4. private?

  5. src/python/pants/backend/jvm/ivy_utils.py (Diff revision 2)
     
     
     

    private?

    1. It's currently used from ivy_task_mixin's loading code. I might be able to make it private with some code movement though.

  6. src/python/pants/backend/jvm/ivy_utils.py (Diff revision 2)
     
     
     

    If possible, this should be private. There have always been a lot of implementation details leaking out of IvyUtils... in the past the ivy report xml file, and here the ivy.xml file.

    1. I think I could do it with some further code movement. Right now both of these are used from ivy_task_mixin.

  7. Using strategies for this seems like an odd factoring.

    It seems like these would do well as two separate/paired tasks, linked by a product containing the resolve information... that way, the first task would handle either running the resolve or loading the cached resolve, and then the second task would handle the fetch (or not, if it got a fully fetched resolve in the product map).

    The product requirement would cause a good error for anyone with existing IvyTaskMixin subclasses without an instance of the paired task installed (I don't think there are any external usages now though?)

    1. Yes, strategy isn't the right word here. I came up with a possible better name in my response to Benjy. I haven't made it a product yet because I wanted to think through how making resolves isolated might change things, but I agree that it would be better.

  8. As mentioned above, it seems like this should be a product.

  9. 
      
  1. 
      
    1. Whoops... this is effectively a fixit-then-shipit. But since I forgot to click the checkbox, let's just assume I'll shipit after this round.

  2. src/python/pants/backend/jvm/ivy_utils.py (Diff revision 3)
     
     
     
     
     
     

    I don't think this deprecation is accurate anymore, since generating an ivy.xml is now a dependency for do_resolve?

  3. Not sure the non-ivy specific base class is useful without a second implementation.... it's hard to imagine that 4sq would be able to subclass this without a bunch of edits, for example.

    1. I agree. I think I was getting ahead of myself here.

  4. Is there anything IvyTaskMixin specific about these steps? They seem like they should be members of ivy_utils or perhaps the existing top-level pants.ivy module.

    1. No. They felt more task related at some point during my cleanup so I moved them back. But they don't have to live there. I'll move them to ivy_utils.

  5. This will not involve writing anything to the cache... is that intentional?

    The multiple returns in here seem very error prone... they're much more involved than simple guard clauses.

    1. Yes. It is intentional. A fetch shouldn't write anything to cache, at least right now. The only thing I'm trying to cache with this change is the full transitive coordinates that result from doing a resolve. If we already have that, as in the fetch case, there's no additional information to cache.

      If we were to do https://github.com/pantsbuild/pants/issues/3053, then we'd want to do some caching for a fetch and not just a resolve.

      I agree that the flow here is rather complex. I tried to cut down on the incidental detail around the check, load, validate return else fall through logic. It's a kind of monadic pattern expressed in a imperative way. I figured it'd be easier to follow as a sequence of if/if/return blocks, but if it's confusing I could come up with a different representation.

      One option would be to just extract a method around the fall-through clauses. Right now, it's rather deep in the _ivy_resolve method. If I did that, then the only thing in the method would be those clauses and it might be easier to reason about.

  6. 
      
  1. Ship It!
  2. 
      
  1. 
      
  2. The name is still a tiny bit confusing because you have IvyResolveStep extend IvyResolutionStep, and to the reader those appear to be synonymous. If we're going to reserve the term "resolution" to mean "choose versions" then let's not use it in this class at all?

    Maybe call this IvyAction or IvyCommand or IvyInvocation or something? And change the various docstrings and names in this class to match? What do you think?

    1. Hmm. I agree that it's confusing. I'd extracted a ResolutionStep abstract class earlier that didn't know about Ivy, but Stu pointed out that it was a bit premature since there isn't any other usage of it besides the Ivy cases.

      I think I'm going to leave it as it is for now and see what falls out next.

    2. ok, but can you add a comment on this class explaining the naming quirk? otherwise it's not clear why this is a shared base class and not the implementer of the "resolve" step.

    3. Will do.

    4. Er: note that I wasn't objecting to having an ivy-specific baseclass... just to a non-ivy-specific baseclass.

  3. If you do change the name, then you can add a comment here explaining that Ivy happens to use the word "resolve" for both what we call "resolve" and for what we call "fetch"...

  4. Hypothetically, could PomResolver replace this in the case where you know you have a globally consistent resolution?

    Right now foursquare replaces the entire resolve task with its own, but I'm wondering if it would be possible to just replace these steps using a strategy pattern.

    Obviously not something to do in this change, but I'm trying to understand how the pieces fit together.

    1. Hypothetically, yes. I'm thinking about breaking these out into tasks, but for this change, I want to get the behavior in, and then refine how it's structured. My longer term goal with these changes is to extract something like SimpleCodeGen for external artifact management. Then pom resolve could leverage that to have fewer custom parts.

  5. 
      
  1. Ooops, I intended that last one to be a "fix it then ship it", but forgot to "Ship It". Seems to be going around...

    Anyway, great change! Code looks fine to me, and I like the clarity this refactoring brings. Thanks for humoring my naming concerns, I had that one further issue with naming... :)

  2. 
      
  1. Oh FFS... Now Ship It for real (post addressing my remaining naming concern).

  2. 
      
  1. Thanks folks, submitted at https://github.com/pantsbuild/pants/commit/27b837981e7e04cea66f5ffee974c271a350946d

  2. 
      
Review request changed

Status: Closed (submitted)

Change Summary:

https://github.com/pantsbuild/pants/commit/27b837981e7e04cea66f5ffee974c271a350946d

Loading...