Create a hybrid optionally async engine.

Review Request #3897 — Created May 16, 2016 and submitted

molsen
pants
3325, 3457
pants-reviews
benjyw, kwlzn, nhoward_tw, peiyu, stuhood, zundel

Create a hybrid optionally async engine.

  • Use futures for backend rather than raw threads
  • Allow for optionally running steps async
  • Abstract common methods from StatefulPool's

CI green: https://travis-ci.org/pantsbuild/pants/builds/138965464

  • 0
  • 0
  • 13
  • 1
  • 14
Description From Last Updated
KW
  1. quick round of drive-by feedback, will devote some more time to this tomorrow.

  2. src/python/pants/engine/engine.py (Diff revision 1)
     
     

    how about a class level docstring with a quick description?

    also, the Async bit of AsyncEngine might be a little overloaded - could have people thinking asyncio etc.

  3. src/python/pants/engine/engine.py (Diff revision 1)
     
     

    why rename step to step_ here?

    1. conflicts with names in pdb.

    2. pdb supports either s or step - not sure this is a good enough reason to muddy the water here.

    3. pdb allows you to look at variables by typing the variable name...

      In this case step is a keyword for pdb, it cased a bit of headache while I was debugging this. The name conflict will cause some pain for some users if they try to debug the code with pdb.

    4. isn't that just a shortcut for what the p and pp commands essentially do? e.g. p step would've been the quick unblocker.

      still not convinced this is an issue worth mucking the code around for.

  4. src/python/pants/engine/engine.py (Diff revision 1)
     
     

    kill

  5. src/python/pants/engine/engine.py (Diff revision 1)
     
     

    how about a class-level docstring here too?

  6. src/python/pants/engine/engine.py (Diff revision 1)
     
     

    it'd be great if engine.py could avoid any direct import/knowledge of node types - and instead get this passed in in the calling module when the object is initialized.

    1. Not sure I understand what you are suggesting gets passed in.

      I could optionally pass in the types to switch on, the actual function that makes the decision or something else entirely.

      One option could be to create some predefined deciders and let the invocation choose on initialization. I'd rather not have Node type info bleeding into the code base more than needed.

    2. yeah, was thinking something simple along the lines of Engine(..., threaded_node_types=(FilesystemNode,)) - a predicate could work as well. entirely up to you.

  7. src/python/pants/engine/engine.py (Diff revision 1)
     
     

    is there a reason here and above for the use of Exception directly vs a more specific type?

    1. prior art, not opposed to changing.

    2. seems like it'd make sense.

  8. src/python/pants/engine/processing.py (Diff revision 1)
     
     

    StatefulPoolBase?

  9. 
      
MO
MO
MO
ST
  1. 
      
    1. Review title/description ciontain an extra word.

  2. src/python/pants/engine/engine.py (Diff revision 2)
     
     

    This should probably be private? Also, new parameters, but no new docs.

    1. just to clarify by private you mean _submit_until vs submit_until?

  3. src/python/pants/engine/engine.py (Diff revision 2)
     
     
     
     

    This pool is pretty intimately related to caching ... I know I added it as a late comment, but it's critical that these cache checks happen in separate threads as well.

    The easiest way to implement that with regard to threading is probably to always thread something if you're going to check the cache for it.

    But more fundamentally, you might think of it as speculation instead: with Twitter's scala futures, this might look like:

    val winner = Future.select(checkTheCacheFor(step), execute(step)).map {
      case (Return(winningResult), _) => winningResult
    }
    

    ...where you are are racing the cache check against the execution of the task.

  4. src/python/pants/engine/engine.py (Diff revision 2)
     
     
     

    ditto

  5. src/python/pants/engine/engine.py (Diff revision 2)
     
     

    unused?

    1. hold over from some testing, will remove.

  6. src/python/pants/engine/engine.py (Diff revision 2)
     
     

    _node_

  7. src/python/pants/engine/engine.py (Diff revision 2)
     
     
     
     

    This sounds verby, but should probably sound adjectivey. def _is_async_node(self, node):, for example.

    Also, it should be documented in the parent class rather than here probably.

    1. is_async sounds good to me, will change.

  8. 
      
MO
MO
MO
ST
  1. 
      
  2. src/python/pants/engine/BUILD (Diff revision 4)
     
     

    xx: unused I think (and odd otherwise)

  3. src/python/pants/engine/engine.py (Diff revision 4)
     
     

    threaded_node_types not documented

  4. src/python/pants/engine/engine.py (Diff revision 4)
     
     

    Should probably name the collection for the role it is playing, rather than only its type.

  5. src/python/pants/engine/engine.py (Diff revision 4)
     
     
     
     
     
     
     
     
     

    These have changed on the base class in master in an important way: see https://rbcommons.com/s/twitter/r/3971/

  6. src/python/pants/engine/engine.py (Diff revision 4)
     
     

    "thred", here and a few other places.

    Please also end comments in periods.

  7. src/python/pants/engine/engine.py (Diff revision 4)
     
     
     

    Two partial calls in a row can be reduced to one.

  8. src/python/pants/engine/engine.py (Diff revision 4)
     
     

    Not clear what this means.

  9. src/python/pants/engine/engine.py (Diff revision 4)
     
     

    It seems like the entire body of reduce could be reused between the two concurrent subclasses if enough logic was moved into _submit_until and _await_one.

  10. src/python/pants/engine/engine.py (Diff revision 4)
     
     

    This should not be necessary in the context of threading. But if this is intended to resolve content addressed values, it should probably do it inside the threads rather than here (as it might touch disk).

  11. src/python/pants/engine/engine.py (Diff revision 4)
     
     

    unused

  12. src/python/pants/engine/engine.py (Diff revision 4)
     
     
     
     
     

    I think the extraction of the pool base is left behind from when these had a shared superclass?

  13. src/python/pants/engine/processing.py (Diff revision 4)
     
     
     

    I think that based on the current state in Engine, all changes to StatefulPool can be discarded...?

  14. src/python/pants/engine/storage.py (Diff revision 4)
     
     

    This should be instance level, not class level.

    But also, it's not clear that any of these operations need to be locked, as each of them should be safe independently if:

    1) there are no overwrites (there shouldn't be)
    2) the underlying storage is thread safe (lmdb definitely is)

  15. src/python/pants/engine/storage.py (Diff revision 4)
     
     

    The factory function does not need to lock, as it does not mutate any shared state.

  16. src/python/pants/engine/storage.py (Diff revision 4)
     
     

    unused?

  17. 
      
MO
ST
  1. This is getting close I think. Thanks.

    Please be sure to do some larger scale testing in pantsbuild/pants (and Twitter's repo if you have the time).

  2. src/python/pants/engine/engine.py (Diff revision 5)
     
     

    Was this intentional? AFAICT, this will break the multiprocess engine, which needs to pass back a key, rather than the result itself.

    1. Yeah, this looks like this will break multiprocessing. Those tests are disabled right now because of #3510, which is why I missed it. I will add an optional resolve results param because for the threading case I'd rather potential disk IO happen in the thread. Will update shortly.

  3. src/python/pants/engine/engine.py (Diff revision 5)
     
     

    This collection grows forever. It should likely be calculated from the in_flight set instead.

  4. src/python/pants/engine/storage.py (Diff revision 5)
     
     
  5. Should probably explicitly say pool_size=2 here, just in case we encounter a machine with... 0.5 processors =P

  6. 
      
MO
NH
  1. 
      
  2. src/python/pants/engine/engine.py (Diff revision 6)
     
     

    This isn't going to work because result doesn't have a value yet.

    Also, I find the meaning of the resolve_results flag a little unclear. Does it mean that if it is false, the request should not be resolved to a result?

  3. src/python/pants/engine/engine.py (Diff revision 6)
     
     

    nit: add an in between items and pending_submissing

  4. src/python/pants/engine/engine.py (Diff revision 6)
     
     

    we could separate the non-async steps out and run them after submitting the async ones.

    Also, I feel like with the async switch, we could re-check the to_submit calculation as we go to run more steps if len(pending_submission) - n is > than the number of jobs in flight and there's a mix of async and sync steps.

    I'd like to have some unit tests around this before doing that though.

  5. src/python/pants/engine/engine.py (Diff revision 6)
     
     
     
     
     
     
     
     
     

    This submits two items of work. Should the pool capacity calculation take that into account?

    1. Do you mean in case the pool size is 1? If the pool size is 1 we should probably just short circuit and do serial. Otherwise the pool size should be determined by cpu's not tasks as far as I can tell.

    2. More that before, we would submit a single item into the queue and do the cache check inline. Now the cache check is lazy and both are submitted at the same time. So, for each in flight step we are submitting two units of work to the pool instead of one. This might mean that the pool's queue will have more items waiting in it than before.

  6. src/python/pants/engine/engine.py (Diff revision 6)
     
     

    nit: period

  7. src/python/pants/engine/engine.py (Diff revision 6)
     
     

    Does this clean up storage, or does the caller?

    1. good catch, looks like the call to super(...).close() got dropped.

  8. src/python/pants/engine/engine.py (Diff revision 6)
     
     

    I tried unskipping some of the multiprocess tests and they failed. I don't think it's caused by this patch because I get the same failures on master, but it might be nice to try to get them working locally again while you are in here.

    1. My understanding is that these are broken because of the bug mentioned in the skip decorator.

  9. 
      
MO
NH
  1. After thinking about that last comment a little more, I had another thought.

  2. src/python/pants/engine/engine.py (Diff revision 7)
     
     

    What happens here if both the deferred step and the deferred cache are successful? It seems like pop would raise a key error on the second result.

    Is there a test that exercises that case?

  3. 
      
MO
MO
NH
  1. Ship It!
  2. 
      
ST
  1. One remaining issue with the cache, but otherwise looks good.

  2. src/python/pants/engine/engine.py (Diff revision 9)
     
     
     

    _submit_until and _await_one should be defined as abstract methods in this class.

  3. src/python/pants/engine/engine.py (Diff revision 9)
     
     

    _maybe_cache_put and _maybe_cache_get still have a bit of a confusing API.

    The reason _maybe_cache_get returns a key, is that if the key doesn't exist, the cache isn't in use for a task. That's vital information in this case, because it means you don't need to launch a task to hit the cache for that.

    But also, the key it returns is critical, because it's what you need to pass to _maybe_cache_put: currently you're passing the StepRequest rather than the 'keyed' (ie, sorted...) StepRequest.

    So this needs a bit more tweaking so that:

    1. the cache key computation (which is CPU bound anyway) happens in the main thread
    2. the cache_get is only launched if 1 results in a key
    3. the cache_put always uses the key from 1
  4. 
      
MO
ST
  1. Just minor fixes. Thanks!

  2. src/python/pants/engine/engine.py (Diff revisions 9 - 10)
     
     

    This is a boolean property lookup, so it definitely shouldn't need to be memoized.

  3. src/python/pants/engine/engine.py (Diff revisions 9 - 10)
     
     

    _deferred_step is private, but deferred_cache isn't. While renaming it, might be good to indicate that it is a deferred cache "get".

  4. 
      
MO
MO
MO
MO
Review request changed

Status: Closed (submitted)

Change Summary:

commit de65a79cbc3bb87bf44cf1da813c9e24743e5267

Loading...