Parallelize jar-create

Review Request #2310 — Created June 2, 2015 and discarded

kwlzn
pants
github.com/kwlzn/pants kwilson/parallel_jarcreate
pants-reviews
fkorotkov, ity, jsirois, nhoward_tw, stuhood, tejal, zundel
  • Migrate execution_graph.ExecutionGraph to pants.base for general use
  • Parallelize jar-create using ExecutionGraph for perf wins in ./pants bundle (for one of our important target cases involving just under ~500 jars, this change causes a reduction in time for jar-create from ~156s to ~30s)
  • Decompose jar_task.JarTask.open_jar() arg handling for readability (vestige of a larger discarded change)
  • Misc cleanup

[illuminati pants (kwilson/parallel_jarcreate)]$ ./build-support/bin/ci.sh
...
[== 25:51 CI SUCCESS ==]

  • 1
  • 0
  • 5
  • 0
  • 6
Description From Last Updated
There is already a builtin facility for running jobs in a pool. You have no dependencies, so that seems more ... JS jsirois
IT
  1. prelim comments - would be good to call out the exact perf improvements that this change introduces.
    A before/after time for a specific target would be a good start.

    1. good call, updated.

  2. 
      
KW
NH
  1. Looks pretty good. I've got a few nits and couple comments.

    1. thanks for the review!

  2. nit: s/_/-/ in flag name.

  3. ExecutionGraph doesn't expect on_failure callbacks to raise exceptions. They're mostly there to allow for side-effects like invalidating / validating target output. I don't think this is necessary.

    When a Job fails, the ExecutionGraph will raise an error containing the all the failed job's key's string representation after all the non-dependent jobs have finished.

    I've thought that adding a fail-fast option to the ExecutionGraph would be a good idea, and this task might be a nice place to have it available, so that a failure for this task causes it to bail early.

    I think I should also clarify the docstring on Job so that it is more clear that it's not intended for raising exceptions.

    1. I assumed since on_failure is called in the main thread that this would be the best way to handle errors, but letting ExceptionGraph raise with all keys is way better imho. thanks - fixed.

  4. I think it would be better if you used the target.address.spec as the key here. It results in better error message behavior.

    1. great suggestion, definitely see the benefit. fixed.

  5. nit: use {} and string.format() instead of %s, %

  6. For things like this, I like having the condition turned into a guard clause to reduce nesting.

  7. I feel like this could have a better name, maybe _jar_tool_args?

    The other thing that's tricky about it is that it needs the contextmanager because _render_jar_tool_args sets up state in a temp directory. I found that a little confusing.

    1. fixed. yeah, definitely a small hairball of nested contexts that were already in here. might be nice to unravel that at some point.

  8. Good catch!

  9. 
      
KW
NH
  1. One other issue, otherwise LGTM for widening the review pool.

  2. move the worker_pool init to after the check for jar_targets so that we don't need to incur the cost of spinning up a thread pool.

    It might also make sense to shut the pool down after.

    1. great catches - fixed. thanks again!

  3. 
      
KW
KW
KW
ZU
  1. Code looks good, I'd like to see an integration test that creates multiple jars and tests the correct construction of the output jars. Also, our CI machines may default to 1 cpu, so you might need to force add multiple workers.

  2. 
      
JS
  1. 
      
  2. There is already a builtin facility for running jobs in a pool. You have no dependencies, so that seems more appropriate. There are a few variants, but one entrypoint is here: https://github.com/pantsbuild/pants/blob/master/src/python/pants/goal/context.py#L175

    The subproc_map in particular seems appropriate since Products.get and the rest of products is not thread safe (the work does this: self.context.products.get('jars').add(: https://github.com/pantsbuild/pants/blob/master/src/python/pants/goal/products.py#L183

    I will admit I'm slightly confused though. It seems unlikely your tests of the 500 jar case would not have encountered products corruption. I may be missing something obvious but I've marked this as an issue to start with - looks broken.

    1. thanks for the comments John & Eric. will take a deeper look at the pool/thread safety concern and get an integration test in place.

  3. 
      
KW
Review request changed

Status: Discarded

Loading...