Support sharding of descendant (::) specs.

Review Request #3363 — Created Jan. 21, 2016 and discarded

benjyw
pants
pants-reviews
patricklaw, stuhood, zundel

NOTE: Discarded in favor of https://rbcommons.com/s/twitter/r/3560/.

This new syntax looks like path/to/root::shard/nshards.

For example, foo/bar::4/7 will take only those targets in foo/bar::
whose hash modulo 7 is equal to 4.

This is useful primarily for splitting up tests and running them in
multiple concurrent disjoint batches. E.g., N CI workers could each run:

./pants test tests/python/pants_test::i/N --tag=integration

with a different value of 0<=i<N for each worker.

This could potentially supersede the existing support for sharding tests.

The advantages of this approach:
- It works not just for test running but for all tasks, where that might be useful.
- The implementation is simple, and in one place in pants code, rather than being
re-implemented inside each test runner task for each language, possibly involving
complicated interactions with the underlying test running framework.
- It provides more random distribution, as the existing test sharding simply mods
the test's index in an enumeration influenced by the ordering of, and within,
source files, which is hardly random.

Potential disadvantages of this approach:
- Sharding is at the target level, not the individual test level.
- This requires use of the descendant spec (::).

Overall this should really simplify the test runners, and pants's own ci.sh
script, so I think it would be a win to use it instead.

CI passes: https://travis-ci.org/pantsbuild/pants/builds/103991853

  • 3
  • 0
  • 0
  • 0
  • 3
Description From Last Updated
Shadowing a builtin PA patricklaw
first arg is superfluous (default is 0) PA patricklaw
Can use a generator comp instead of list (not that it matters much, but it looks nicer) PA patricklaw
PA
  1. I am wary of adding new syntax, but I have no particular objection. I wouldn't mind seeing this as a global option before it becomes a piece of syntax, but I'm sufficiently ambivalent that I won't block shipping on that.

  2. src/python/pants/base/hash_utils.py (Diff revision 1)
     
     

    Shadowing a builtin

  3. first arg is superfluous (default is 0)

  4. Can use a generator comp instead of list (not that it matters much, but it looks nicer)

  5. 
      
ST
  1. IMO, this doesn't really make sense as syntax.

    1. (...since it's not for humans)

    2. I agree with Stu in this seems like a strain on the syntax to me. If you want to implement this as a feature, maybe make them separate options?

      The hope of removing code from the test runners isn't realistic four our use cases. We have large targets with many tests that need be broken up. Even when you have targets without many tests, they might be long running tests for which sub-target sharding is useful.

    3. There's enough pushback here that I could just spike this entirely. But otherwise I think the syntax makes much more sense than an option. This seems like a totally natural extension of the existing syntax to me, and not any kind of strain. Whereas, since this logic only makes sense when selecting multiple targets, having it be an option just seems wrong.

      Eric, is your use case Java? I don't mind that as much because the implementation is a simple passthru to the test runner. But PytestRun's implementation is actually fairly gnarly.

    4. Yes, our use case for sharding is java.

      The reason why I think that the syntax is a strain is because it introduces some abiguity into the parsing of a spec:

      A spec:

      path # the default target in a BUILD file on that path
      path:name # A specific target in a BUILD file on that path

      Currently added to the command line syntax:

      path: # all targets in BUILD files in that path
      path:: # recursively all targets in that path

      (Just yesterday Garrett floated adding the above 2 as valid inside of a BUILD file and that wasn't palatable to everyone.)

      Proposed:

      path::X/Y # recursively add all targets in that path

      If this proposed change were an orthogonal extension to command line spec extensions, you might also want to do something with path:X/Y to just shard all targets in that dir. But if you do that, it will be somewhat ambiguous with the path:name syntax. We would have to figure out that the 'name' portion contains a '/' character, so it really isn't a name after all, its this other form. We could propose something easier to parse like:

      path::(X/Y)
      path:(X/Y)

      Where the parens would make this less ambiguous, but then we'd have to say that parens are not allowed in regular spec names. I'm not sure that we have any invalid characters in a spec right now. (And I'm not so hot on that proposal because the shell interprets this character so you'd always have to quote it.

      I think it would be better modeled as a global option --shard=X/Y

  2. 
      
BE
BE
Review request changed

Status: Discarded

Change Summary:

See https://rbcommons.com/s/twitter/r/3560/

Loading...