Add support for publishing plugins

Review Request #1021 — Created Sept. 12, 2014 and submitted

ity, jinfeng, jsirois, lahosken, patricklaw, stuhood

Implement publishing plugins in pants, per the design doc:

This RB adds an example plugin, and support to to look at the 'publish_extras' in pants.ini, and walk the products dict, picking up any extra jars and publishing them.

Adding Larry to review the doc.

Ran integration test, and manual testing with the example.

CI: (it'll probably pass?)

  • 0
  • 0
  • 10
  • 4
  • 14
Description From Last Updated
  2. Suppose my org doesn't want an extra artifact produced in every case, but only if the "main" artifact matches some criteria. E.g., if the main artifact comes from a java_thrift_library. Where does the "if" go?

    (Maybe my question is: if my plugin doesn't produce an extra artifact in some case, is that OK? Like, I just don't add anything to the product map, no harm, no foul?)

    1. I think that I addressed this with a comment, but maybe not. Let me know.

  3. Since I^W our intended audience might not know what self.context.products.require() means, add a comment?

  4. src/python/pants/docs/publish.rst (Diff revision 1)

    Eventually, this info should live on a different page. Most of this page is about "Someone already set up your workspace. Here's how to publish your library." But this section is aimed at your org's Pants gurus, those mysterious figures who set up the workspace.

    Hmm, add a comment?

    .. TODO(lahosken) make this section its own page.
    Remember link to it from this page and from dev_task
    and to fix up the link from the sample code

  5. src/python/pants/docs/publish.rst (Diff revision 1)

    The example produces jars, the same type of thing produced by the main-publish-thing. Is that same-type an intended restriction? If so, good to mention it. If not, probably good to mention that, too :-)

    (Depending on the above answer, maybe a related question or maybe moot: the example installed itself in the jar goal. If I have an 'extra' egg to generate along with every jar, what goal do I install in? jar, because that's the <handwave> place in the goal dependency chain </handwave>?)

  2. Should add a classdoc explaining what this is adding, and to which types of targets.

    In fact, as an example it could use a lot more comments overall.

    1. Added a bunch of comments & a class comment.

  3. naming nitpick: "extra" what? technically this is "extra publish artifacts" or something.

    1. The "extra" doesn't need to be an artifact. It can be any type of file.

    2. That would still be an artifact... ie, a thing to make public.

  4. This is demonstrated but not explained in the doc.

    Also, perhaps this should use name-based templating that format strings support, to make it more explicit?


  5. Would be clearer as a ternary:

    classifier = extra_config['classifier'] if 'classifier' in extra_config else ''

  6. ditto

  7. Should target_list be a set? It's possible that some targets share derived_from targets.

    1. While I don't dispute that, I don't think that can happen in this case. The target_list that I'm building here is simply this:

      target, target->derived_from, target->derived_from->derived_from, etc.

      Hmm, are you saying that in the above chain, the same target could appear twice? While I'm still not sure if this is possible, I'll change this code.

    2. Yes, it's definitely possible. Imagine two synthetic targets derived from one concrete target, and bam.

    3. This misses a bigger problem though. Say you happened to have a unique target_list by luck. What if each element of that list has a product in the product map? You still run into dup issues. I think the code needs to have an opinion about either 1st in the list wins or else if more than 1 item in the list has a product map hit, thats an error. Either way using a set for de-duping becomes moot.

  8. This might be good as a helper method on Target... "derived_from_root"?

    1. It's actually not clear to me that derived_from chains actually exist. It seems like it could happen, which is why I'm going through this gyration. Can you think of an example of this happening in the real world?

    2. concrete->synthetic->synthetic doesn't seem too unlikely. And algorithms like this tend to be clearer when implemented without any assumptions than they are when they assume things (and then have to explain why it is safe to assume them.)

  9. src/python/pants/backend/jvm/tasks/templates/ivy_resolve/ivy.mustache (Diff revision 1)

    As an intermediate step, it would be good to go ahead and unify the templating in this step, and have all of the optional publications treated uniformly.

    Eventually source/javadoc/changelog/etc could all be plugins, and we should prove that we can begin to head in that direction.

    1. I'd like to avoid increasing the scope of this change for now. I'm going to try and get some non-Twitter eyes on this review, and if the broader consensus is to keep pulling on this thread, then I will.

    2. That's fine, but I think you should do ONE thing to head in that direction now. Currently this is 100% bolt-on, and I'm not comfortable with that.

  2. Copyright 2014 Pants project contributors (see

    Licensed under the Apache License, Version 2.0 (see LICENSE).

  3. Seems kinda empty. I guess this is here to illustrate that this class could do stuff in its constructor? Probably good to say that in a comment.

  4. If my pants.ini doesn't have override_name, my ~/tmp doesn't get an obvious jar and my ivy-0.0.1-SNAPSHOT.xml says name="None"

    I dunno what should happen in this case. error? use some default name? but probably not name=None

  5. src/python/pants/docs/publish.rst (Diff revision 2)

    This might be a good place to say what happens if a plugin doesn't want to produce an "extra" to go with some jar. (I couldn't figure it out from the sample-code-comments and this doc. When I look at, I think it's OK to not-produce the jar; just don't add it to the product map.)

  2. src/python/pants/backend/jvm/tasks/ (Diff revisions 1 - 2)


  3. src/python/pants/docs/publish.rst (Diff revisions 1 - 2)

    and can be tailored

  2. target.walk with a predicate is almost certainly not what you want. This will miss any JavaLibrary instance that is only reachable through some other dependency. In particular if you pointed this at a target dependency bag, it would return nothing, even if there were JavaLibraries in the bag.

    What you likely want is context.targets(<your predicate>).

    Once again I'm tempted to remove predicate as an argument to walk. I've never seen it used correctly, and I've seen it cause bugs like this many times.

    1. Oh, I missed here that you're just extracting from a single target, not all of them in context. In that case, you want to do a walk without the predicate (or call target.closure()), then filter that list with a list comprehension, then process that list.

    2. This is still wrong. If this is actually what you're intending to do, please fix the comment above it. But even then, if this is an example for others follow, please don't use the predicate parameter to walk()

    3. I noticed in a later review and I flagged this again, but it was considered out of the scope of that review (see This is still wrong, or if it isn't wrong, then it needs to be strongly justified.

    4. Hey Patrick

      I thought that I fixed this. If you check master:

      You'll see that I'm not using walk() anymore. I'm not sure what you want me to do here.

    5. Interesting. Maybe I'm just not understanding when RB updates the diffs! Sorry and nevermind.

  3. This should be under register_goals(), not at global scope.

  4. extra_configs or {}

  5. Seems like these should be negated and then raise the exception here, rather than pass and then raising in the else

    1. I thought about that, but this seemed to be logically cleaner and easier to understand.

  6. I prefer target to tgt. It only buys 3 characters, and I find non-abbreviated variable names easier to read.

    1. What's interesting is that this actually caught a bug -- target is defined in this scope (it's in the parent method), so having it as a separate variable makes it explicit which thing you're using.

  7. env.update(extra_env or {})

    1. This is a temporary array, it makes appending an argument to it easier, as opposed to operating on it in the dict.

    2. Can you give it a more descriptive name?

    3. Changed to 'published_file_list'.

  2. Either <publications/> is un-needed in ivy.xml or else the publications template data needs to be augmented by .update(e.conf for e in extra_confs())

    1. Patrick picked on this line too, but I didn't write it. Not sure that I understand the issue here.

    2. OK - read some more, there is a lot of shenanigans going on here including my descripotion of the problem.

      Let me start with the conclusion: the ivy.xml generated should pair a configurations/conf/@name for every unique publications/artifact/@conf.
      You added new publications/artifact/@conf but no corresponding configurations/conf/@name fwict.

    3. Hmm, I'm still confused by this. Here is a sample file that my code generates:

      [asymmetry birdcage (areitz/pants/add_publish_extras)]$ cat ~/Desktop/com/pants/examples/hello-greet/0.0.1-SNAPSHOT/ivy-0.0.1-SNAPSHOT.xml
      <?xml version="1.0" encoding="UTF-8"?>
      Copyright 2014 Pants project contributors (see
      Licensed under the Apache License, Version 2.0 (see LICENSE).
      <!-- generated by pants! -->
      <ivy-module version="2.0" xmlns:xsi="" xsi:noNamespaceSchemaLocation="" xmlns:m="">
        <info organisation="com.pants.examples" module="hello-greet" revision="0.0.1-SNAPSHOT" status="release" publication="20140916230256"/>
            <conf name="default"/>
            <conf name="sources"/>
            <conf name="publish_extra-hello-greet-extra_examplejar"/>
            <conf name="javadoc"/>
          <artifact conf="default" type="jar" ext="jar"/>
          <artifact conf="default" type="pom" ext="pom"/>
          <artifact conf="sources" type="source" m:classifier="sources" ext="jar"/>
          <artifact conf="javadoc" type="javadoc" m:classifier="javadoc" ext="jar"/>
          <artifact name="hello-greet-extra_example" conf="publish_extra-hello-greet-extra_examplejar" type="jar" m:classifier="" ext="jar"/>

      This looks okay to me -- can you point me to what I'm doing wrong?

    4. I can't, but I also couldn't point to what you were doing right to get this alignment between <configurations/> and <publications/> to be happening! I assume you can and I'll accept the proof above.
      Thanks for bearing with me.

    5. Yeah, this whole thing is a bit confusing (and Stu wants to refactor the template part). But I add to the confs array, which generates the <configurations> section, and then separately I stuff in the extra_publications data structure, which is used in the <publications> section. Since these two things need to go together, we could probably refactor this so that one data structure is used to generate both sections of the XML. Future work!

  3. What's this about? I see no corresponding change in the supply of the changelog parameter and your now joining.

    1. Patrick asked about the same thing. I was test publishing something in birdcage that hadn't been published before, so the changelog was huge, and there is a utf-8 character in there, that was causing python to explode. This fixed it, but I can try taking this out and see if I can repro.

  4. for extra_product, extra_config in publish_extras.items():
  5. I think you want a list. Right now with set there is no communicable semantic to how extra artifacts are picked when there are more than 1 except "its random" and we can do better.

    I see 2 options:
    1. expect the extra_product map to have an artifact for 0-1 of the target_set targets, then a set is fine but 0-1 should be checked and >=2 should raise
    2. have a 1st wins semantic where 1st applies to a well-defined ordering of target_set - basicall s/set/list/

    1. Hmm. If you look back at previous versions of this diff, this was a list, and Stu asked me to change it to a set. The idea is that the way extra artifacts are picked is:

      • for every artifact published, check if it has an extra
      • if it doesn't, check it's derived_from chain, to see if anything in that chain has an extra

      This set/list business is only for the derived_from chain.

    2. Thats false - the set is seeded with target_set = set([tgt]). So the set includes the target and its full derived from chain. So you're picking a random element from set+derived_from chain as the 1st to check for a mapped product. Compounding the random pick, you then don't stop and continue to loop over the rest of the set. I stand by the 2 options above.

  6. src/python/pants/docs/publish.rst (Diff revision 3)

    Unless a reader knows name means name in the context of [name]-[rev](-classifier).[ext] this list will be less than useful. You should document the how the final basename results from the compnents to make it more clear how to influence the artifact name.

  7. src/python/pants/docs/publish.rst (Diff revision 3)

    This cannot be blank - sortof. This treads on odd semantics land. maven artifacts are typically (always?):
    [groupID directories]/

    This means [artifactId]-[version](-[classifier]).[ext] must be unique to avoid over-writes and unexpected results. In the spirit

    1. Please ignore this 1/2 finished thought. I went on to read the code where you attempt to ensure uniqueness and forgot to come back here and delete the comment.

  8. src/python/pants/docs/publish.rst (Diff revision 3)

    Please don't encourage yes |

  2. set(confs or [])

    1. Okay, but I didn't add this line. ;)

  2. Also all literals are unicode literals, the u is superfluous. And furthermore, you're opening up a file and reading out bytes here, so the encoding does nothing but decode the changelog as ascii, then reencode it as utf-8. This can do nothing except cause errors on non-ascii input.

    1. We were seeing an issue where working with the changelog was causing pants to bomb out with a python DecodeError. I admit I don't really understand utf-8 in Python (it seems broken to me), but this change fixed the bug. I can try taking this out and see if I can repro the original issue.

  2. Probably because you are opening the file in text mode (by not passing 'wb'). That causes python to attempt to decode the file in the interpreter default encoding, which is generally ascii. Pass 'wb' and then have contents =, and then call contents.decode('utf-8') if you're willing to enforce utf-8. If this changelog is otherwise opaque to you, you should just leave it as bytes.

    1. The changelog comes from git:

      I'll see if I can repro the changelog issue I was seeing before. If so, I'll file a separate issue for it, since it's orthagonal to this change.

    2. Okay, I can repro this issue, and I tried changing the file write to use 'wb', but that didn't fix it. I'm inclined to just leave this in rather than file a separate issue.

  2. Ah, I see. In that case you should probably hoist the target_set loop into a helper method, both for the clarity of this code and to avoid that name leak.

    1. I don't think this will work. 'tgt' is a parameter to stage_artifacts, so it's not just an issue with the code that I added.

  2. user_supplied_keys = frozenset(['override_name', 'classifier', 'extension'])
    if not  user_supplied_keys.intersection(frozenset(extra_config.keys())):
      raise ...
  2. In fact, you don't need the frozenset around extra_config.keys()

  2. flexibility

    s/allowed, in/allowed in/

  3. src/python/pants/backend/jvm/tasks/ (Diff revisions 2 - 3)

    This is duplicated above.

  4. indentation

  2. Please don't leave this as is. I'd like to see more detail about the repro, because that doesn't sound right to me. Could you give me a full stack trace, your change to safe_open, and a snippet of the bytes in the troublesome changelog so I can verify, please?

    1. Oh, so I misread what was happening here a bit, and I think I see the issue now. The changelog comes from the Git Scm which is shelling out to git, which returns bytes (or str in py2). Since we can't be confident of the encoding of git messages (I've seen all kinds of wacky stuff slip into messages, and a quick google search doesn't seem to indicate that this is standardized), the changelog should be left as bytes, always. If the changelog parameter passed to stage_artifact is not of type bytes (it's unicode in p2 or str in py3k), then that's the problem. We should trace the code path between where the changelog is read in and where it gets written out to see how it's been accidentally coerced into code points. This usually happens by interpolation into a unicode literal, which is all undecorated string literals in our case.

      I'm very much opposed to this change going in as is, or in letting it float if there's a waiting bug. If you want you can separate it out into another issue/RB, but that doesn't seem entirely necessary to me since you're already working primarily in this module.

    2. Here is the stack trace:

      Exception caught:
      File "/Users/areitz/", line 205, in <module>
      File "/Users/areitz/", line 200, in main
      File "/Users/areitz/", line 181, in _run
      result =
      File "/Users/areitz/", line 295, in run
      return engine.execute(context, self.goals)
      File "/Users/areitz/", line 48, in execute
      self.attempt(context, goals)
      File "/Users/areitz/", line 184, in attempt
      File "/Users/areitz/", line 42, in attempt
      File "/Users/areitz/", line 722, in execute
      ivyxml = stage_artifacts(target, jar, newentry.version().version(), changelog)
      File "/Users/areitz/", line 648, in stage_artifacts
      return stage_artifact(tgt, jar, version, changelog, confs, extra_confs=extra_confs)
      File "/Users/areitz/", line 556, in stage_artifact

      Exception message: 'ascii' codec can't encode character u'\u2019' in position 120739: ordinal not in range(128)

      I'll see if I can get the changelog later tonight. Here is what I tried with safe_open:

      asymmetry os_pants (areitz/add_publish_plugins)]$ git diff
      diff --git a/src/python/pants/backend/jvm/tasks/ b/src/python/pants/backend/jvm/tasks/
      index e306214..0c0bc09 100644
      --- a/src/python/pants/backend/jvm/tasks/
      +++ b/src/python/pants/backend/jvm/tasks/
      @@ -551,8 +551,9 @@ class JarPublish(JarTask, ScmPublish):
      return self.artifact_path(jar, version, name=name, suffix=suffix, extension=extension,

      • with safe_open(path(suffix='-CHANGELOG', extension='txt'), 'w') as changelog_file:
      • changelog_file.write(changelog.encode('utf-8'))
      • with safe_open(path(suffix='-CHANGELOG', extension='txt'), 'wb') as changelog_file:
      • changelog_file.write(changelog.encode('utf-8'))

      • changelog_file.write(changelog)
        ivyxml = path(name='ivy', extension='xml')

        IvyWriter(get_pushdb).write(tgt, ivyxml, confs=confs, extra_confs=extra_confs)

      (the line that's commented out works, the one that is commented in is what exists in master presently. Going 'wb' here doesn't fix it.

    3. Okay, the diff didn't really translate properly, but I think I see what you mean--that using encode('utf-8') works, but without the encode it doesn't. That makes sense. As I said above, changelog is here a unicode instead of a bytes. So we need to figure out where in the code path it's changing, since it should be opaque bytes all the way from the SCM call through to this write.

    4. Okay, sorry about that diff. I analyzed the code, and here is what I believe to be the flow:

      changelog handling in

      1. publish gets changelog:

      2. changelog is printed:

      3. changelog is passed as an argument to stage_artifacts():

      4. changelog is passed as an argument to stage_artifact():

      5. stage_artifact() writes the changelog file (this is what blows up):

      fetching the changelog:

      1. calls into scm:

      2. My claim is that scm is really a Git object (inherits from Scm abstract class), so this resolves to:

      3. execute git whatchanged via _check_output():

      4. _check_output() constructs a command line, and calls it via _invoke():

      5. _invoke() uses process.communicate() to get the STDOUT from git:

      6. _check_output() calls _cleanse() on the output, before returning it:

      7. _cleanse() does a decode on utf-8. This might be the source of the problem:

      I tried taking _cleanse() out of the flow, but that caused this stacktrace:

      Exception caught:
        File "/Users/areitz/", line 205, in <module>
        File "/Users/areitz/", line 200, in main
        File "/Users/areitz/", line 181, in _run
          result =
        File "/Users/areitz/", line 295, in run
          return engine.execute(context, self.goals)
        File "/Users/areitz/", line 48, in execute
          self.attempt(context, goals)
        File "/Users/areitz/", line 184, in attempt
        File "/Users/areitz/", line 42, in attempt
        File "/Users/areitz/", line 711, in execute
          coordinate(,, oldentry.version(), oldentry.sha, changelog
      Exception message: 'ascii' codec can't decode byte 0xe2 in position 120739: ordinal not in range(128)

      (I thought it would have puked printing the string to the console).

      Now, I admit that I thorougly don't understand how python handles UTF-8 (it seems somewhat obtuse, IMO), but it seems like adding a utf-8 encode just-in-time when writing the changelog file makes sense, versus leaving it undecoded, and having to fix every place where the changelog is used.


  2. The fundamental problem is that it might not be utf-8. It could be latin-1 or just malformed input. However, you've identified here that we've apparently been relying on all Git SCM output being utf-8 encoded from the beginning, so I suppose there's no point in belaboring the possible inputs further.

    This simplifies things a good deal. Now your changelog is known to be a unicode (a sequence of code points), meaning wherever you want to write it back into a file you need to reencode it as utf-8. So I'd change this to a single line:


    And keep the 'wb'.

    As for how utf-8/unicode works in python, it's not difficult, just confusing:

    py2 unicode (str in py3k) is a sequence of code points. It has no notion of bytes or encoding, it is only a sequence of code points that refer to entries in the table of the Unicode standard. If you want to write these code points out to a file, you must encode them, generally by calling my_unicode.encode(<my encoding>) where <my encoding> is utf-8 99% of the time. You should never call decode on a unicode (even though you can) because decode is meant to turn bytes into code points.

    py3 str (bytes in py3k) is a sequence of 8 bit bytes. This is all you know about the sequence unless you have outside knowledge of how it's encoded. If you know it's utf-8, you can call my_str.decode('utf-8') to get back a unicode sequence of code points. Sometimes this decoding is done implicitly if you try to use a byte sequence in a place that wants a unicode and decides to encode that unicode. In this situation Python tries to use the interpreter default encoding to decode the bytes, and that default is almost always ASCII. If you were passing around utf-8 that had actual non-ASCII code points in it, then this fails, and generally in a code path that you don't expect.

    The general strategy for avoiding this is:
    (1) Always know which type you're passing around and receiving, and normalize as early as possible to unicode if you can.
    (2) Be careful about interpolating input strings into literals, since all of our literals are unicode, and bytes interpolated into unicode strings will be implicitly decoded, usually as ASCII.
    (3) Always explicitly encode your unicode to bytes (99% of the time with utf-8) before writing them out, for the same reason as (2).

  2. Oops, typo:

    py2 str == py3 bytes == sequence of bytes

    Shortly I'm going to be switching all of our compatibility uses to six, where we can say six.text_type and six.binary_type to refer to these things unambiguously across python versions.

  1. LGTM - I think its down to working out a resolution with Patrick to the encoding issue or removing that ~unrelated bit from this change.

  2. Aha - here's where you populate confs with the new stuff - I was blind.

  2. src/python/pants/docs/publish.rst (Diff revision 4)

    Assuming it's OK for some publishable things to not-provide an "extra", then changing this "When" to an "If" would disambiguate.

    (but if that assumption is wrong, then this suggestion is terrible)

  2. How about just?:

    self.assert_result(pants_run, self.PANTS_SUCCESS_CODE, expected=success_expected)

    The custom message in each branch is obvious from the binary nature of the test.

    1. I think the reason why I can't do it this way is that in the negative case (doing a test where I expect pants to fail), I need to bail out of the method. What follows this code is some additional checks to see if the published file exists. But in the negative case, it won't.

  1. lgtm minus one last 'wb'

    1. actually, minus the other outstanding issues as well

  1. yay, doc+example looks coherent to me

Review request changed

Status: Closed (submitted)