expose products for jvm bundle create and python binary create tasks

Review Request #3959 — Created May 31, 2016 and discarded

3477, 3501
mateor, patricklaw, peiyu, stuhood

bundle.jvm currently does not expose a product_type for any archives it creates (only the loose bundle directory). And binary.python-binary-create currently does not expose any products at all. We would like these tasks to expose products, in order to be able to consume those products in downstream (internal) tasks.

In particular, JVM apps (at least at Twitter) are generally published as archived (zip'd/tar'd) bundles. And python apps are deployed directly as pexes.
1. Expose deployable_archive product in jvm bundle_create task (arhives, including zip, tar, tar.gz, tar.bz2).
2. Expose same deployable_archive product in python_binary_create task (pexes).
3. Above products live in .pants.d, and they have symlinks created in dist dir.
4. Modify integration and unit test cases to accommodate changes in (3).
5. Add test cases.

ci pending:

  • 4
  • 0
  • 0
  • 0
  • 4
Description From Last Updated
Can we change the product name so that jvm and python do not share a product? We have an open ... MA mateor
As with the change to the go code, this invalidation block should be as tiny as possible. The body of ... ST stuhood
This comment no longer applies so it should be amended or deleted. MA mateor
I am finding the if statements on the archive to be confusing. There are checks for archiver, archivepath, app.archive and ... MA mateor
  1. Please add Patrick/Mateo to this review, as they have different packaging infrastructure, and would be interested.

    1. ok. Will do it when I finalize the change.

  2. src/python/pants/backend/jvm/tasks/bundle_create.py (Diff revision 1)

    This case continues to get more confusing. IMO, we should see what people would think about deprecating this option.

    Relatedly: products should always live under .pants.d, so it might be a good idea to put the product there with a known-unique name (ie, with use_basename_prefix=False). Then, the name used under dist would be independent, and could perhaps just be a symlink to the copy under .pants.d.

    I think that combining those changes would mean that we always bundle self.context.targets() using their target ids under .pants.d and use that as the product, but we additionally symlink the target_roots to dist under their basenames.

    1. deprecating use_basename_prefix?

      I think I'd better resolve this first since Wei's new request is about adding additional logic in this "if" block. If we are going to take out use_basename_prefix then the additional logic will not be necessary.

      We are not currently holding products in .pants.d, right? at least not for jvm_bundles target?

    2. Not for jvm_bundles, no... but for almost every other product that pants maintains, yes. I feel pretty strongly that having this new product point to something under .pants.d is the right approach, so let's consider that part a blocking issue here.

    3. If products live in .pants.d that means a "./pants clean-all" will wipe them out. Do we want that behavior? or maybe we put bundle folder in .pants.d but keep archives in dist?

    4. Arguably, clean-all should remove dist as well. And definitely agree with stu here, pants should output all products to pants.d so that downstream tasks (even ones we haven't envisioned yet) can consume the products and get some guarantees about them.

  3. A quick survey of the codebase indicates that this should likely be plural: deployable_archives

  1. I am very glad to see this done! This ship it is contigent on putting the products into pants.d

  2. Can we change the product name so that jvm and python do not share a product? We have an open ticket to continue decoupling those backends and I would prefer not to backslide further. If you have something downstream that needs to consume both than I think it should just require the two distinct products.

    How about jvm_deployables or something like it?

    1. @Mateo: I think that putting them in the same product like this actually causes them to be less coupled. Suddenly, a task that wants to upload deployable artifacts can do it for any language, without being editted. If there were a product per language, you'd have to make edits to that task for each new language.

    2. Yes, I agree that is an issue - I had to deal with this for buildgen. But I think I disagree that it is less coupled. I have a task internally that consumes JvmBundles. If I require deployable_archives then now I have to build every python dependency as well as jvm.

    3. but I can of course imagine a task that wants all bundles - as I said above I have felt constricted by upstream tasks that I cannot easily make consume new products.

      How about exposing two products - deployable_archives and also a per-language bundle jvm_bundles and python_bundles?

    4. err..pex_bundles

    5. I expect that you could filter by target type if you really didn't want your task to be used with a non-jvm target? But I don't have any objection to exposing multiple products.

    6. So I think the agreement here is we have 3 new products. and the relation among them are "deployable_archives = jvm_archives + python_archives". deployable_archives lives in both jvm and python tasks. jvm_archvies lives only in jvm task, and python_archives lives in only python tasks.
      Does it sound good enough?

    7. The problem I have isn't with the targets that my task consumes, it is with the tasks that the round engine will schedule because I required a joint product. If I want a task to consume 'deployable_archives' because I need JvmBundles, then the round engine will schedule python tasks to run. I shouldn't need to run python tasks to create jvm artifacts. Even though that still happens in places today for legacy reasons, I would prefer to not add more.

      At the very least, thanks for humoring me with this :)

    8. Mateo, sorry about my confusions. I don't know enough about round engines to fully understand the legacy reasons behind this behavior of round engine.
      But for this review, we want to stick deployable_archives products, is it correct?

    9. Sure, np Yujie - that was actually responding to Stu. I think the proposal was deployable_archives product to include both jvm_archives and python_archives, each of which would also be made available separately.

      I think that sounds great. I would like to request to use pex_archives or pex_python_archives, since we hope to one day soon have a world with a multiple python_bundle types. But I won't block on that - it is a good bit of work away after all.

    10. Thank you Mateo. A unified product is how I implemented currently. It also fits our internal needs. But I am open to discuss it more in the future.

    11. If you are ok with it, can I drop the issue?

    12. No quite yet...AFAICT you haven't yet exposed the per-language python products. To be clear about what I am asking for: I would like to see 3 product types put into context:
      jvm_archives - only jvm bundles
      pex_python_archives - only python bundles
      * deployable_archives - both.

      If anything the python bundle not having its own product is even more egregious a demand since it will require that you sit through a jar resolve before your task can consume the python bundle.

    13. well crap - the formatting on the above is illegible. It was supposed to be:
      - jvm_archives - only jvm bundles
      - pex_python_archives - only pex archives
      - deployable_archives - both.

    14. ah, my bad, I didn't read your comments carefully enough.
      You said "the proposal was deployable_archives product to include both jvm_archives and python_archives, each of which would also be made available separately".
      I clearly missed the second part of the sentence.

      I will make the change accordingly.

    15. Currently, there is a jvm_bundles product already which has the bundle folder.
      The original idea for deployable_archives is to have jvm_app archives (zip, tar, etc) and python pex.
      So I think jvm_archives should have only archives of jvm_app.
      What do you think?

    16. Okay, I took a look. I think that makes sense - in my read the jvm_bundles product as it stands is both JvmBinary and JvmApp loose bundle directories under pants.d and the jvm_archives is the tarball or zip of the JvmApp targets. So making the jvm_archives just point to the compressed artifact from the JvmApp sounds reasonable to me.

      I would still like to see the BundleCreate become a base class and have JvmApp and JvmBinary compose from it. But that is outside the scope of this review so what you propose is fine.

  3. =stu with the deprecation. I would honestly prefer to deprecate than continue to add code paths. Another option would be to bite the bullet and make the JvmApp and JvmBinary bundling into separate tasks with a common superclass.

    1. Oops - that last sentence was meant for a different review. Too many open windows. But the rest stands.

  4. Same comment as above, can this be explicitly a python product?

    In fact, can it be explicitly a 'pex' product? We do not use pex but we do have python bundles. Kris Wilson and patrick have discussed a generic python resolver that is pluggable between pex and vex (our internal bundle) but since this is currently coupled to pex it would be best to make that clear in the product name.

  1. The structure looks good to me... just want to see the invalidated blocks tightened up before shipping.

  2. src/python/pants/backend/jvm/tasks/bundle_create.py (Diff revision 2)

    As with the change to the go code, this invalidation block should be as tiny as possible. The body of the block should basically just be per-target decision to either build the bundle, or use the bundle that is already built, and it should use split out methods to do each thing.

    Also, this block is not currently looking at which vts are valid... for any vt.valid == True, the archive should already exist.

  3. src/python/pants/backend/jvm/tasks/bundle_create.py (Diff revision 2)

    Should break out as a separate method.

  4. src/python/pants/backend/python/tasks/python_binary_create.py (Diff revision 2)


    See https://github.com/pantsbuild/pants/blob/master/contrib/scrooge/src/python/pants/contrib/scrooge/tasks/thrift_linter.py#L99-L114 for a very good example of how to do the valid/invalid split.

  1. So you mentioned in the comments that you want to have the jvm_archives only be added for JvmApp. You'll have to be a bit careful to make sure that the product is always populated - as it stands the only time the deployable_archives product is populated is when the jvm_app has an archive defined in the payload.

    AFAICT archiver (when JvmApp has a defined archive-type) must be True in order to populate the bundle_archive product and archive and archivepath must both be True in order for the archivesymlink to be created.

    I am honestly unsure how much of this is by design - I have to confess to being a bit lost as to what is happening there. A request for some extra clarity is below.

  2. This comment no longer applies so it should be amended or deleted.

  3. I am finding the if statements on the archive to be confusing. There are checks for archiver, archivepath, app.archive and on line 146, archive.

    That last check is only passing because we import the pants.fs.archive - it should always be true. If it was meant to be archiver it is still unneeded since archivepath being True is coupled to archiver being True.

    I believe that was missed because there are so many variables around archive - can you coalesce those checks to be if app.archive or something like it? You might not be able to - I haven't tried myself. But this is hard to scan as is.

Review request changed

Status: Discarded