Proposed refactor of the engine: RoundEngine

Review Request #481 — Created June 10, 2014 and discarded

ity
pants
152
jsirois
This proposed refactor of the Engine is driven by: https://github.com/pantsbuild/pants/issues/152

The "idea" goal is special since it requires a compile of a sub-graph of the overall Target graph followed by a gen on the entire graph.
Currently, there is no non-hacky way of specifying a specific section of the Target graph to be passed to a specific task.

This refactor, aims to extract out the scheduling of Tasks and their dependencies from register.py (where its been traditionally stored) and be able to pass through a Target graph if/when required. No matter what order goals are specified in the register.py, their order is overwritten by the schedule() method. 

In fact, instead of programmatically specifying the order of Tasks, it would be cleaner for Task to have an abstract method getOrder() which could be implemented by all the subclasses to return their specific order. This would then allow each Task to know its own order and encapsulate this knowledge. I might tackle that in a subsequent change in order to reduce the impact of this change. 

Beware, this change is not integrated at all. It aims to show the barebones structure of the proposed refactor. For now, I wanted to send out this isolated change to get everyone's thoughts on it and make sure we are all on the same page. Especially John, Patrick and Benjy - since you folks have been heavily involved in the recent refactor.
./pants build tests/python/pants_test/engine/:test_round_engine

tests/python/pants_test/engine/test_round_engine.py .

=========================== 1 passed in 0.30 seconds ===========================
tests.python.pants_test.engine.test_round_engine                                .....   SUCCESS

Process finished with exit code 0
JS
  1. It was good to get this out to start converging.  I think we had one disconnect at the center of the RoundEngine function/lifecycle that I try to explain better below.
    Let me know if this makes any sense.
    1. I thought about this on the ride in, and I think this came up at the whiteboard, but there are really 2 changes going on:
      1.) build the phase graph recursively via requesting products in prepare / advertising products via a Task classmethod
      2.) modifying 1 to allow requesting products on a subgraph
      
      1 is what I'm focused on below and 2 is what the idea goal actually needs.  It might be good to break up the changes into sep RBs for 1 & 2 - but that's totally up to you.
    2. You are right, I think it would make sense to have these two separate for clarity more than anything. And your assumption earlier was correct about the static scheduling being a placeholder. Thank you for the feedback, I am going to make changes on that basis and send out an update by tomorrow hopefully if I can deal with the oncall fires fast enough. 
  2. src/python/pants/engine/round_engine.py (Diff revision 1)
     
     
    I'm working with the assumption that this static scheduling bit is a placeholder for changing the Task.prepare() signature to be Task.prepare(RoundManager).
    
    With that assumption aside, it would be nice to have Tasks just know about products.  If I'm the JunitRun task, I just need the 'classes_by_target' product (and a classpath - but thats not a product yet).  So ideally with the JunitRun hat on I'd only have to say:
      def prepare(self, round_manager):
        round_mananger.schedule('classes_by_target')
    
    And then JavaCompile for example would just have to say:
      @classmethod
      def product_type(cls):
        return 'classes_by_target'
    
    With that information, the Round engine should be able to look at the round_mananger.schedule('classes_by_target') request JunitRun makes and scan through the list of all registered Goals (Task types in .action) and query their product_type classmethod to find the Task to schedule to run before JunitRun to satisfy its product requirement.  That Task producing classes_by_target will be in a list of Tasks (goals) that forms a phase and so thats the phase to schedule to run before JunitRun (in this example, the 'compile' phase).  This pairing process then recurses to the list of Tasks in the just scheduled phase and and that recursion is what builds the Phase graph.
    
    There was some handwave above in details, but I think they're just that for now.  In particular, I can imagine the product_type classmethod being plural and certain tasks might populate several products in the product map - maybe.  Also right now product_type() is an instance method - but the change should be smooth.  Finally, ideally the schedule call would return a promise that can be called on in execute to supply the requested products instead of seperately consulting the context product map for the same 'classes_by_target' string.  These bits though are minor and could be worked out after the main change fills out.
  3. 
      
IT
JS
  1. 
      
  2. src/python/pants/engine/round_engine.py (Diff revisions 1 - 2)
     
     
    Here after the call to prepare the round_manager has some new required product types scheduled likely not satisfied by any task in the phases passed in above.  Concretely, a very common phases will be just ['test'] and that does not have compile or resolve or gen, all of which need to be discovered and added to phases recursively here.
  3. 
      
IT
JS
  1. 
      
  2. I'm not sure if this is ready for another look, but this RoundManager setup is suspicious.  A convincing test would have install_goal above using a Task that just recorded for execute - like today, but did real scheduling in prepare().
  3. 
      
IT
IT
JS
  1. I gave a detailed look at the test and mechanics of the RoundEngine and this all LGTM.  Thanks for making the test more honest.
    I think its time to expand to other reviewers and the pants_reviews groups.  Same review or new one is up to you.
    1. awesome - will send out a new one with updated description.
  2. src/python/pants/engine/round_engine.py (Diff revision 5)
     
     
    Phase.all() and kill the phase parameter to get_phases_by_product
  3. src/python/pants/engine/round_engine.py (Diff revision 5)
     
     
    It would be nice to force an iterable and list(types) here.  If this is to smooth over existing API impls, I'm happy with a TODO, otherwise a single return type for this api seems like a good thing to enforce.
  4. src/python/pants/engine/round_engine.py (Diff revision 5)
     
     
    I think this can go.  It probes GroupTask and GroupTask takes a single product_type for the whole group in its factory method.  That's reported via its product_type and for groups to make sense, all GroupMembers should in fact produce the same products.
  5. src/python/pants/engine/round_engine.py (Diff revision 5)
     
     
    kill - dead code
  6. src/python/pants/engine/round_engine.py (Diff revision 5)
     
     
    Interesting.  This does not do classic recursion, but I confirm in a repl that list.__iter__ is dynamic so the for range expands as this loop runs.  I'm coming to terms with this - its probably fine and it certainly does the job - more memory efficiently to boot.
    1. I think this is a remnant of me being used to using dynamic list iteration in Java - I was naturally inclined to do it this way :)
  7. 
      
IT
Review request changed

Status: Discarded

Loading...