Ivy Imports now has a cache

Review Request #1785 — Created Feb. 17, 2015 and submitted

zundel
pants
zundel/mapjar-uses-cache
1097, 1103
1783
42d2537...
pants-reviews
benjyw, davidt, jsirois, patricklaw

Add an invalidation check to Ivy Imports

 git diff master --stat
 examples/src/protobuf/com/pants/examples/imports/BUILD            |   7 ----
 src/python/pants/backend/codegen/tasks/protobuf_gen.py            |   7 ++++
 src/python/pants/backend/jvm/targets/BUILD                        |   1 +
 src/python/pants/backend/jvm/targets/import_jars_mixin.py         |  34 +++++++++++++++++-
 src/python/pants/backend/jvm/targets/unpacked_jars.py             |   3 --
 src/python/pants/backend/jvm/tasks/ivy_imports.py                 |  84 +++++++++++++++++++++++++++++++++++++-------
 src/python/pants/backend/jvm/tasks/ivy_task_mixin.py              |  29 ++++++++++++++-
 src/python/pants/backend/jvm/tasks/unpack_jars.py                 |   2 +-
 tests/python/pants_test/backend/__init__.py                       |   0
 tests/python/pants_test/backend/jvm/__init__.py                   |   0
 tests/python/pants_test/backend/jvm/targets/test_unpacked_jars.py |  25 +++++++++++++
 tests/python/pants_test/backend/jvm/tasks/BUILD                   |  15 ++++++++
 tests/python/pants_test/backend/jvm/tasks/__init__.py             |   0
 tests/python/pants_test/backend/jvm/tasks/jvm_compile/__init__.py |   0
 tests/python/pants_test/backend/jvm/tasks/test_ivy_imports.py     | 123 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/python/pants_test/backend/jvm/tasks/test_unpack_jars.py     |  24 ++++++++-----
 16 files changed, 319 insertions(+), 35 deletions(-)

Added a new unit test for ivy-imports. Also ran this on targets in our own repo.

  • 0
  • 0
  • 7
  • 5
  • 12
Description From Last Updated
ZU
  1. Currently faiing CI but this is close, I think. I wanted some feedback on what I had so far.

    If you like I can split the review further by just sumitting the ivy_imports test first with none of my invaliation check mods to the task.

  2. I encountered this when debugging. It should never happen if things are working right. Should I remove it?

  3. I think i need to add, "or target in invalid_targets" in case the import jar target is different (needs a new directory)

  4. Is this the right thing to do here? I need to populate 'ivy_imports' somehow for targets I didn't call mapjars() on or downstream tasks may fail.

  5. removed this line.

  6. 
      
ZU
JS
  1. I won't be able to give this the review it deserves until Thursday or Friday so I think I'm out on this one.
  2. 
      
ZU
ZU
ZU
  1. Sorry, i didn't actually submit this one, I accidentally hit 'Submitted' on the wrong review.

  2. 
      
NH
  1. Thanks for putting this together. I've got a few comments.

  2. Is this related to the ivy cache changes?

    1. There is another comment I made which is hard to find. This branch of code was being tripped before I had all of the invalidation logic worked out. I haven't seen it since, but it seems like maybe a better error message if it does crop up again. I can remove it if it is distracting.

    2. I'd prefer it if it were separate, to make git blame make more sense, but I don't think it needs to block this going in as is. Dropping.

  3. I'd prefer it if the jar libraries were accumulated in a temp var, and then assigned to the memo after. That way, if for some reason this code is called by multiple threads, the resulting memoized value should be the same.

  4. do the library specs have ordering like classpath, or are they order agnostic? It might make sense to not sort them if order matters.

    1. In maven, order matters, but ivy's strategies for resolving libraries don't. I sorted them because of comments Patrick made in another review:

      https://rbcommons.com/s/twitter/r/1776/

      "... are you worried about code stability"

  5. could you put a new line between execute and the following assertion?

  6. could you put a new line between execute and the following assertion?

  7. this looks like it might be left over from debugging

  8. Is there any case where more than one jar could match here? Would that be a bug?

    1. I suppose it is possible, but very unlikely as the suffix starts at the root of the subdir created in the pants_workdir.

  9. now that you have the helper method, you can rm the comment here

  10. 
      
BE
  1. Looks good to me, but I'd like Patrick to take a look at the fingerprint stuff.

    1. I spoke with Patrick OOB. He's out of the office today but he did look at it and has some comments. No big rush on this. The performance benefit for us is small for us (a few seconds at most) on this compared with the cache invalidation problem I fixed the other day.

  2. 
      
ZU
NH
  1. 
      
  2. I think you could just assign from the comprehension without the if statement here.

  3. 
      
ST
  1. 
      
  2. Can this move to the Target constructor, to try to have the error bubble up in the appropriate context? A few of the other target types check for non-empty sources lists, although we're not nearly consistent about it.

    1. This is a check specific to protobuf_gen. Did you think it was looking at the sources payload field? I'm not sure if I could generalize it to all targets, but maybe we could for codegen. I added a TODO though since codegen is ripe for a refactoring.

  3. Why the None check? I remember vaguely that this might be due to targets skipping cache invalidation... if so it might be good to comment here.

    If it's not due to invalidation, it would be good to just have an assertion on the collection.

    1. Crap, I can't remember how this happened and now I'm afraid to remove it. Not a great answer, I know. Maybe it was a bug in an interim version that I fixed another way, so I removed it. I will revisit if the problem crops up again.

      I had a BUILD file with a bogus dependency in imports and had 'None' in the list for some reason.

      I added some test cases and error handling for misconfiguration as I remember it.

  4. It's unclear based on this comment what to do about that problem. Is there a solution? Should a ticket be attached?

    1. This is an alternative to the default strategy which would be target.payload.fingerprint(). This is the solution to the problem. I'll clarify the comment.

  5. Is this supposed to be checking for nulls like the above?

    1. After reviewing the code in its present state, I think the other "None" check is superfluous.

  6. I didn't really review this task initially, but how is "importing" a jar different from "resolving" it? Does it really needs to be a separate task from IvyResolve (with its own cache strategy)?

    1. Yes, there are a few differences.

      1) Ivy resolve happens after codegen. This needs to come first.
      2) There should be no binary dependencies on these jars. For example, our imported protobuf jars contain generated .java code generated with the wrong protobuf compiler/plugins and we don't want them to get on the classpath.
      3) Ivy resolve puts everything in one path and resolves all dependencies together. This resolves just the jar_libraries declared together and puts symlinks in a target specific directory that is different from the ivy resolve cached path.

    2. Ok. The alternate set of symlinks explains the error I'm currently seeing on https://rbcommons.com/s/twitter/r/1785/ ... the fact that ivy_resolve_symlink_map is shared between all instances of the IvyMixin is a problem.

      I'll say that we have had no end of trouble due to a task similar to IvyImports that compiles remote thrift. The fact that it resolves independently from other tasks is a real downside because you won't resolve to a SINGLE version of a dep and so you can receive conflicting versions via binary and idl... thus you can have binary deps that are incompatible with generated code.

      One solution might be to move ivy_resolve before codegen, since it's possible ahead of time to know what the "default" dependencies for the synthetic target will be. This is how maven does it, iirc, in the sense that you have to pre-declared the injected deps for the generated targets.

    3. (This is https://rbcommons.com/s/twitter/r/1785/ - what error are you referring to?)

      In this task, it only reuses the mapjar() call from IvyTaskMixin. It does not try to share the product. It makes a new product named 'ivy_imports', so I hope there is not a conflict.

      Today our use of this task is just for one .jar that we pull from (we have a repo of just .proto files extracted from all the projects at Square.) For unpacked_jars (which is what we currently use) the jar that we get we pull from we pull into a directory that we do not add to a classpath, because it doesn't contain any java binaries in it. We only use that jar to extract files from so we can get the .protos onto protoc's path for use as protoc imports and/or re-compile them. In our case, there is no conflict resolution because we specify one library and one version and that library has no dependencies.

      My initial thought on tackling the problem of importing protos from other reps was that moving ivy-resolve before codegen would solve the problem, but there was a chorus of disagreement to that idea. If you want to read the history of why this is a separate task from ivy-resolve, here are some pointers:

      https://rbcommons.com/s/twitter/r/592/

      https://groups.google.com/forum/#!searchin/pants-devel/garrett$20zip/pants-devel/WzgfxKTA86o/ixiV_yHFkAEJ

    4. I skimmed both of those, but I didn't see the dismissal of moving ivy_resolve... where was that? I feel like it might be time to re-evaluate. Won't block this review though.

    5. Hi Stu,

      I looked around for more information on why imports is a separate task. Maybe these were F2F discussions. I can't find any written evidence that explains that decision. I know the online reviews and discussions are a lot to wade through. Garrett and spent literally months working on imports support and shepherding it through the review process. It evolved from an 'imports=' attribute on protobufs to being a generally applicable way to pull in an archive and re-compile the code in the current build configuration. This logic is working well for us, aside from the fact that it re-does work that is already done, hence this follow on for cache invalidation.

      IIRC, the reason why ivy resolve step occurs after codegen is because codegen creates targets, modifies the build graph, and may change the information used to resolve 3rdparty dependencies in general.

      You didn't ask this, but the reason why we don't just re-use the exising ivy resolve task in its entirety is that it does way too much for what is required. For example, there is no need to add these jars to the classpath. We did re-use the mapjars() call.

  7. What should cause a file to be silently ignored here? Shouldn't mapjars only result in symlinks, for the purposes of buildcache stability?

    1. Yes, that is the way I expect mapjars to behave, but ivy will stick an ivy.xml file in the dir. I added a comment and raised an exception on anything else.

  8. Should this be the top level method rather than verify_product_mapping? It has a thinner API.

    1. These checks are specific to this test. I could re-use verify_product_mapping() in another test, but not this method.

  9. 
      
ZU
ST
  1. I don't see anything else here, but I'm not 100% aware of how caching works yet, so it would be good if you could get some other reviewers.

    Will reiterate that it seems like it might be easier (perhaps even in the short term) to have prep-tasks that either:
    1) produce a product of additional jars for IvyResolve to fetch, or
    2) add synthetic targets and depedencies which IvyResolve will consume naturally

  2. 
      
PA
  1. 
      
  2. I'm quite concerned with this behavior. In my mind, one of two things should be happening here:

    (1) This FPStrat can't be used until the tasks that hydrate these sources payload fields have been run. Then you can fingerprint against exactly the fields you want.

    (2) This comment isn't necessary, because the specs really do encapsulate all of the necessary (non-transitive) invalidation information. I'm skeptical that this is true, but I don't have a full picture of this product flow in my head so I might be wrong.

    1. What I could do here is to have DeferredSourcesField override fingerprint() with exactly this logic.

    2. With what logic, exactly? The latter? That also seems inadvisable. Though if it the behavior is still immutable then I suppose that's okay. But then how do you fingerprint those sources? You just don't, since they're theoretically defined by the jar spec that backs them?

    3. I ran into 'hydration' issues before which prompted me to add this custom fingeprint strategy. I was getting a 'DeferredSourcesField.NotPopulatedError'.

      However, I just removed my fingerprint strategy override and I'm not running into it now. I'm pretty sure the difference is that since running into it the first time I added the logic to reconstruct the product from the directory contents.

      I've run tests locally. The change is queued up on the builder now.

    4. This is the sort of thing that I'd really like to be confident is doing the right thing in principle though. Fingerprints and payloads have such strictly defined interfaces and guarantees about immutability to avoid this kind of ambiguity.

      So, the question is this: is the "fingerprint" of this (target, task) pair really only dependent on the specs of the ImportJarsMixins? It seems to me that specs are never sufficient--after all, the target behind a spec can change arbitrarily and you won't get invalidation issues. On the other hand, maybe you want to opt out of computing a fingerprint for ImportJarsMixins entirely. In that case, it's some upstream task's responsibility to invalidate and create products for those targets. You can just then assume that they'll be there.

    5. CI went green at https://travis-ci.org/pantsbuild/pants/builds/52053270

    6. @Patrick: I think that our comments may have gotten crossed. The current code I think reflects your sentiment that this tasks in general should probably not be messing around with the fingerprint strategy. It may be sufficient to just fingerprint the specs, but that is a bit scary. Again, I've dropped the custom fingerprint implementation and am now using the default fingerprint strategy.

    7. Yup, between reloading the rest of this pipeline into my head and our "offline" convo, I'm +1 on this change. Ship it.

  3. 
      
ZU
ZU
Review request changed

Status: Closed (submitted)

Change Summary:

Thanks Nick, Stu & Patrick. Submitted at b944b38

Loading...