Multiple dependency_managements with multiple ivy resolves.

Review Request #3367 — Created Jan. 22, 2016 and submitted

gmalmquist
pants
gmalmquist/dependency-management-pass-2
2830
pants-reviews
benjyw, nhoward_tw, patricklaw, stuhood, zundel

This is the second pass of implementing dependency management in
pants.

The first pass was reviewed here:

https://rbcommons.com/s/twitter/r/3336/

The design document is available here:

https://docs.google.com/document/d/1AM_0e1Az_NHtR150Zsuyaa6u7InzBQGLq8MrUT57Od8/edit#

This change allows managed_jar_dependencies targets to be chained
together; the set of pinned artifact is the union of all jar
coordinates specified by the transitive closure of the target
specified in a jar_library's managed_dependencies field.

When there are conflicts between dependencies in the transitive
closure, the default behavior is to raise an error. Optionally,
if using --no-jar-dependency-management-require-disjoint-sets,
the most direct version (the first version encountered in a BFS)
will be used, and a warning will be logged.

The change to make ivy do multiple resolves is fairly simple; see
ivy_task_mixin.py's resolve() method.

Added integration tests for running multiple resolves, and for the new managed_jar_libraries factory.

Added unit tests for transitive artifact set calculation.

CI went green here: https://travis-ci.org/pantsbuild/pants/builds/104711840
CI went green here: https://travis-ci.org/pantsbuild/pants/builds/104947421
CI went green here: https://travis-ci.org/pantsbuild/pants/builds/104985770

  • 0
  • 0
  • 2
  • 0
  • 2
Description From Last Updated
BE
  1. Before I look at the code, I have to say that this sounds like a recipe for trouble:

    "When there are conflicts between dependencies in the transitive
    closure, the versions specified by dependees override the versions
    specified by their dependencies. Thus, the artifacts specified in
    the target directly referenced by the managed_dependencies= field
    have the highest priority (which seems to be the most intuitive)."

    Nowhere else do we have this kind of logic that preferences targets based on their graph distance from you,
    and it seems exceptionally fragile.

    What is the use-case here? Can we not simply ban such conflicts outright?

    1. One could argue that the ordering of targets already causes something like this kind of behavior (but just using java's classpath semantics where the first instance of a resource on the classpath wins.)

      I think the practical concern is that you are going to build a single managed_dependencies() target that is going to cover >90% of use cases. Then there will be a handfull of oddball cases you need to satisfy. In that case, would you want to have multiple copies of managed_dependencies targets, each with potentially hundreds of dependencies listed almost all alike? or would you rather just override the oddball cases with a few exceptions?

    2. Maybe I'm misunderstanding something, but couldn't you have something like:

      T -> MD1, MD2

      T' -> MD1, MD2'

      Where MD1 is a managed_jar_dependencies target for all the non-problematic dependencies, and MD2, MD2' are managed_jar_dependencies targets specifying two different ways of satisfying the oddball cases? (And T, T' are target targets used just for grouping). Then most targets can use T (maybe implicitly, via pants.ini) and the few targets that require odball settings can explicitly use managed_dependencies=T'? [Or, if managed_dependencies= must point to a managed_jar_dependencies target (I forget if it must or not) then T, T' can be of that type. Or one could simplify to MD2 -> MD1, MD2' -> MD1.]

      Does that not get us the effect we want, but in a more principled way? Or am I missing something?

      PS It's true that ordering of targets causes something like this, but it's horrible when it does and it's something we're actively trying to avoid. AFAIK we make no guarantee that targets will affect the classpath in topological order.

    3. Before I get too deep into the code itself--which seems fine at the surface--I do want to agree with Benjy here. I'm fundamentally skeptical of a resolution strategy where the resolution for some subgraph G is different when it happens to be embedded in the larger context of a graph H. This means that there's path-dependency always lurking in every resolution. In those situations I'd much rather see an explicit error that forces some additional (checked in) annotation that resolves the ambiguity.

    4. Can you clarify how this changes things Patrick? What you are complaining about is the current state of Pants.

    5. So, what I said isn't exactly true. What happens to day is that graphs are resolved differently depending on what other specs are specified on the command line. What I like about this is that it allows us to make jar resolution more deterministic for a given target, but I realize that the behavior we chose might not be what you want.

      @gmalmquist: Before we land multiple resolves, I think we need to add some details into the design doc with some examples (and what the alternatives are)

    6. I would be amenable to Benjy's suggesting of doing:

      T -> MD1, MD2

      T' -> MD1, MD2'

      instead. It seems more difficult to maintain, but is admittedly more disciplined.

      Another possiblity to consider: A flag to choose between these behaviors, defaulting to the more strict approach?

    7. Hmm, why would this be more difficult to maintain?

    8. Just because you'd have to maintain more managed_jar_dependencies targets (3 instead of 2 in your example). It wouldn't be terrible though.

    9. Ah, I see. Yeah, but to my mind that might be easier to maintain, because it calls out the parts that could potentially differ and makes it easy to see which one you chose.

    10. Eric: it doesn't change things from the way that Ivy works in pants today, but I thought the point was to tighten that up. pom-resolve, for example, doesn't suffer from this path-dependency problem, but of course it avoids it by always resolving the full global graph as the only resolution.

      I think having some target's resolution affected by its dependees is dangerous and difficult to reason about, just on principle. And it only gets worse when the issue is classpath conflicts, which are notoriously difficult to detect before runtime. If we're going to go in for a new system of resolution, I'd really like it to at least have a mode where this is prevented, and for that mode to be reasonable to work with.

    11. Patrick: Resolution isn't affected by dependees in this scheme; jar_library targets "depend on" (really just reference) managed_jar_dependencies which control their resolution. What the jar library resolves to is solely dependent on the managed_dependencies it uses, its dependees don't factor into it.

      This patch isn't intended to fully tighten up the resolution; it's intended to be a step in the right direction that solves a subset of needs. Nick Howard is concurrently working on additional changes that are related but disjoint from mine that will make everything deterministic and sane.

    12. Eric: I think this change does make things more complicated than the current state, if I understand it correctly. It's true that today you might get different resolutions by building a larger graph in which the smaller one is embedded. But at least (in a well-behaved graph) that resolution only depends on the target closure. With this change, it will depend on the graph distance between the managed_jar_dependencies targets. For example, if I have MD1->MD2->MD3 and I change that to MD1->MD3->MD2, the resolution can change, even though the closure didn't.

  2. 
      
ZU
  1. Could you break out the ManagedJarDependencies factory into a separate review? I think that part is not going to be controversial and is unrelated to the issue of multiple resolves.

    1. Can do.

    2. Split off here: https://rbcommons.com/s/twitter/r/3372/

  2. 
      
GM
GM
ZU
  1. 
      
  2. this name looks backwards. They are specs indexed by coordinates, but the name 'coordinates_by_spec' implies it is the other way around.

  3. 
      
GM
BE
  1. 
      
  2. To fight the rearguard battle one last time... :) Are we sure we need this functionality? Can we not just always error in this case?

    1. It's certainly not 100% necessary, but I don't think it's evil either. It basically uses the same model of inheritance that is standard in object oriented programming, and even prints out warnings when things override.

    2. TBH it makes me uncomfortable. I'm really leery of introducing a significant consequence to topological ordering, when we have never had any (intended...) such consequences on dependency resolution before. My mind is boggling just trying to unravel the potential knock-on effects.

      I don't see the analogy to OOP. These are graph dependencies, not "is-a" relationships.

      I'm a big fan of keeping things as simple as possible, especially when the underlying problem domain is so freakin' complicated, and about to undergo further changes around Nick's design doc.

    3. Would it make you more comfortable if instead of using the dependencies field of managed_jar_dependencies for this, I made an inherits_from field that only accepts specs of managed_jar_dependencies?

    4. Slightly less uncomfortable on one count, but then it introduces another previously unheard-of concept - that of inheritance of target properties. That may not be a bad idea in general (we currently have to do this by introducing custom target types), but it's probably something to tackle separately, and generically. And I think configurations (or whatever they're now called) in the new engine do have this ability?

      So to sum up, I'd still prefer not to have this at all until we're sure it's truly necessary. But I'd like to hear what Eric, Patrick, Stu and others think, I don't want to unilaterally veto stuff (nor do I have that power...) Shall we bring them in on this conversation?

    5. Talked to Eric and Patrick on slack; they are both in favor of removing this feature for now, and reconsidering more thoughtfully later if necessary.

  3. 
      
GM
GM
BE
  1. Thanks for the changes!

  2. 
      
GM
GM
Review request changed

Status: Closed (submitted)

Change Summary:

In commit 39223aba9c0b2494f8a0cdef753fdbf19956e11c, thanks Benjy, Eric, and Patrick

Loading...