Proposed refactor of the engine: RoundEngine

Review Request #545 — Created June 16, 2014 and submitted

ity
pants
247
pants-reviews
benjyw, jsirois, patricklaw
Proposed refactor of the engine: RoundEngine

This proposed refactor of the Engine is driven by: https://github.com/pantsbuild/pants/issues/152

The "idea" goal is special since it requires a compile of a sub-graph of the overall Target graph followed by a gen on the entire graph.
Currently, there is no non-hacky way of specifying a specific section of the Target graph to be passed to a specific task.

This refactor, aims to extract out the scheduling of Tasks and their dependencies from register.py (where its been traditionally stored) and be able to pass through a Target graph if/when required. No matter what order goals are specified in the register.py, their order is overwritten by the schedule() method. 

This change is for building the phase graph recursively via requesting products in prepare(..) and advertising products via a Task classmethod product_type(). I will follow up with another change to be able to request products on a sub-graph (in order to actually fix the idea goal).

Beware, this change is not completely integrated. For now, I wanted to send out this isolated change to get everyone's thoughts on it and make sure we are all on the same page. Especially John, Patrick and Benjy - since you folks have been heavily involved in the recent refactor.
./build-support/bin/ci.sh
  • 0
  • 0
  • 20
  • 1
  • 21
Description From Last Updated
JS
  1. Just a few main issues fwict:
    
    Tasks - many instances of these issues - I pointed out just a few in the review below:
    1.) for each Task that today calls self.context.products.require* and/or accesses self.context.products in execute to retrieve product mappings, an override of prepare is required to schedule each one of the product types required/used.
    2.) for each product mapping written in execute, and only for those product types and no others, Tasks should report product_types.
    
    RoundEngine:
    3.) The RoundManager should do context.products.require* on the Tasks' behalf so the Tasks can ask for products in 1 place.
    
    Otherwise this lgtm.  Product promises can come later - I'd enjoy sending you that review later after round 2 - subgraphs goes in.
     
    1. Thanks for the thorough feedback. I have addressed most of these and have the final wiring underway. Please have another look when you can.
  2. AntlrGen does grammar -> .java so the product type is more like 'sources'.  That said - sources are special currently as products and we map them by injecting synthetic targets owning them into the build graph instead of using the products registry.  I'd stick to only advertising product_type for Tasks that register items in the context.products in the execute call flow today.  Making this a uniform way to communicate products is for follow up.
  3. This is too much detail - no where else are these product types accessed in this class.  Up above self.register_jvm_tool(compiler, tools) is called and down below self.tool_classpath(compiler) is called indicating the actual interaction with the 'jvm_build_tools*' products happens in a base class.  This prepare should move there.
  4. This task does not access 'classes_by_source' in the context.products - kill.
  5. This call should not be needed.  So this RB adds RoundManager.schedule which stands to the side of context.products; however the idea is they become one - or 1 interface.  It would be good to combine this call and schedule for 1 way to require a product mapping be made available in execute.  It may make sense to change round_manager.schedule to hand a method-per-products.require* for ease of transition.  So here this line would go and in prepare:
      round_manager.require_data('exclusives_groups')
  6. This implies invalidation/caching is per Task and not per Task per product type.  That's how things are done today anyway, but this stands out as arbitrary.  No suggestions, just talking out loud.
  7. Its worth noting the new part of the contract - that this is a well-known name other Tasks can bind to to get product mapping - thus its a public api.  The doc as it stands is only focused on the jvm group case where javac and scalac produce classes both IIUC.
  8. 'jvm_build_tools' is required_data below so it can't be a product of this Task.
  9. and context.products.require* call implies a prepare is needed where the requirement moves to a schedule call on the round_manager for the product mapping.
  10. Not true - thats provided by check_exclusives.py
  11. 
      
JS
  1. Forgot - it would also be great to do a clean switch over to RoundEngine.  So make round_manager a required non-None arg to prepare now and fixup the RoundEngine or just the test to ensure constructors are called in the same order prepare is.  Today most tasks rely on __init__ being called in prepare order.  IE: I introduced prepare but did not convert Tasks to it, instead LinearEngine calls (__init__, prepare), (__init__, prepare), ...  Keep that semantic for the 1st change and the risk is low here since the key guaranty of orderings in the Task lifecycle will be maintained.  The only risky bit is getting all the product_type lists and prepare round_manager schedules faithfully reflecting the context.products uses. 
  2. 
      
ZU
  1. I some high level questions:
    
     - Why does it run 'gen' over the entire graph. Shouldn't we just need to run 'gen' over the graph that the target or targets depend on?  
    
     - When 'pants goal idea <target>' was working, it seemed to do more work than I thought it had to.  What is the purpose of the 'compile' step?  When you are authoring new functionality  or trying to bring up the IDE on a broken tree, the fail of compile seemed to preclude the ide config being generated correctly.  This is a minor annoyance when all of your java code is under /src/java as in the Pants repo, but not really recoverable when you have a tree with hundreds of java source roots (maven style layout). 
  2. 
      
ZU
  1. 
      
  2. Does every code generator need updating like this? I didn't see jaxb_gen.py (we recently added this) in the list.  
    1. See my comment on this: https://rbcommons.com/s/twitter/r/545/diff/1/?file=1191899#file1191899line31
      This product_type list is in error.
    2. Aha yes, as you can see I have not updated all the product_type() and prepare() for all Tasks - I wanted to send this out on a smaller scale to give an idea of how the refactor is intended to work and make sure we are all on the same page. If we are all happy with with this approach, will send an update to this with all the Tasks updated. Also, as John said AntlrGen shouldn't have that product_type() specified.
  3. 
      
BE
  1. First of all, I am super-excited for this change. This will be a significant improvement over manually registering phase deps. 
    
    I have several comments, with the caveat that I may be misunderstanding the design, and where it's heading. See below for details inline.
    
    Cheers!
    
    1. Thank you - me too! Stay tuned for the design doc.
  2. For the apparently common case where the scheduled products are hard-coded per type, we could have a classmethod that returns them, and the Task base class could use that as its default implementation of prepare(). It would save a little bit of boilerplate.
    
  3. src/python/pants/engine/round_engine.py (Diff revision 1)
     
     
    OrderedSet? Since you test for membership below.
  4. src/python/pants/engine/round_engine.py (Diff revision 1)
     
     
    This comment seems stale - there's no way to specify a graph, and there are no phases here, just products.
  5. src/python/pants/engine/round_engine.py (Diff revision 1)
     
     
    You don't need to construct an instance here. goal.task_type.product_type() will work, as it's a classmethod.
  6. src/python/pants/engine/round_engine.py (Diff revision 1)
     
     
    phases_by_product can be a defaultdict(set) or (defaultdict(OrderedSet) if order matters) and then you won't need this check.
  7. src/python/pants/engine/round_engine.py (Diff revision 1)
     
     
    It's a bit weird to modify a mutable argument, that would need to be well documented. 
    
    Or you could nest this method inside _prepare(), so it can access phases and phases_by_product from its closure. That would be more intuitive I think.
  8. src/python/pants/engine/round_engine.py (Diff revision 1)
     
     
    round_manager could be local to this method, right? 
  9. src/python/pants/engine/round_engine.py (Diff revision 1)
     
     
    Is phase the right resolution for rounds, or should it be goals? In other words, must we execute all goals in a phase when the product-based selection is happening at the goal level anyway?
    
    Say we allow each task to look at the target graph when scheduling products (and isn't that the plan?). Then, for example, junit_test could see that it has no targets to act on, it would not schedule any products, and then none of the jvm machinery would be invoked, which seems like a good thing.
    
    Or, am I missing some obvious reason why this is wrong?
    1. The phase-goal pairing and the ordering of goals is by design and allows us to do things today that we wouldnt be able to if we couldn't rely on it.
      Discussed this with John offline to confirm why we do this today:
      The Phase goal list grouping provides 2 benefits:
      1.) 1 verb across languages when this makes sense - the list of goals in a phase allows aliasing the 1 verb to many actions, some of which are active depending on targets in the context
      2.) Ordering of Tasks that implement a verb.  There are often cases where you want to order operations independent of data.  The classic example is integration test setup and teardown.  Today folks inject custom tasks in the test phase and use the ordered list as pect of the Phase to insert setup 1st (spin up mysql) and append tear down last (shut down mysql).  
      This basically goes into the discussion of aop on core goals - its by design and will require a re-think of the architecture to make even simple things like IT setup happen (because you couldn't rely on the ordering anymore). I will defer this discussion to another time so we could chat about the pros/cons and possible re-arch if any.
      
    2. I agree that you shouldn't change this now, but I do want to continue this discussion. 
      
      But to refer to your example: If a test needs a running mysql instance it should require the product 'mysql_instance', not rely on tasks being injected in some order. That way A) requirements are explicit and B) only tests that require mysql actually cause one to run. That's where ordering should come from, no?
      
      Today if any goal in a phase needs to run, then all goals in the phase run. And I envision a future where this isn't true. In fact, I envision a future where "phase" isn't really a thing (other than a lightweight way to share verbs). 
      
      Anyway, this isn't actionable for this change, just something to discuss, perhaps on the design doc. 
    3. The issue there is encapsulation.  We write the JunitRun task and ship it out - it depends on 'classes_by_target' product data naturally and only.
      
      Now I come along as a user and I want to do this (real custom extenstion in a bootstrap BUILD file at twitter - we'll be moving to register.py - not the point here):
      ...
      class EnsureTests(NailgunTask):
        def __init__(self, context, workdir):
          super(EnsureTests, self).__init__(context, workdir)
          self._jvm_tool_bootstrapper.register_jvm_tool(WEBCORE_KEY, [':twitter-webcoretools'])
      
        def execute(self, targets):
          to_test = set()
          for root in self.context.target_roots:
            for whitelisted in WHITELIST_PATHS:
              if os.path.commonprefix([whitelisted, root.address.buildfile.relpath]) == whitelisted:
                for build_file in BuildFile.scan_buildfiles(ROOT_DIR, whitelisted):
                  for address in Target.get_all_addresses(build_file):
                    target = Target.get(address)
                    if is_ensure_tests_target(target):
                      to_test.add(target)
                break
      
          self.context.log.debug('Testing the following targets for paired tests: %s' % to_test)
          if to_test:
            paths = set()
            for target in to_test:
              paths.add(target.target_base)
      
            self.runjava(classpath=self._jvm_tool_bootstrapper.get_jvm_tool_classpath(WEBCORE_KEY),
                         main='com.twitter.web.tool.EnsureTestsExist',
                         args=list(paths))
      
      goal(name='ensure-tests', action=EnsureTests).install('test', first=True)
      
      I don't produce anything naturally, I just check and warn or fail a build before tests run if there are not enough tests.
      This sort of thing does not fit naturally in a data-dependency driven flow.  We'd have to concoct a fake data output and even then we'd have to go edit JunitRun to fake product data depend on something like 'before_tests'.  That would then apply to every Task - it would need to depend on fake 'before_XXX' to allow for customization of a phase like this by end users.
      
      One way to view the list of Tasks in a Phase today is as the only way to compose functions.
      
      Instead of:
      class EnsureTests(NailgunTask, TaskInterceptor):
        def wrap_execute(delegate):
          self.check_tests()
          delegate.execute()
      Phase('test).wrap(EnsureTests)
      
      You do:
      goal(name='ensure-tests', action=EnsureTests).install('test', first=True)
      
      I'd love to get to the function composition / wrapping / interception method of enhancing a phase when data-dependencies are not in-play, but thats further down the refactor road.
      I think we need the phase graph as a graph with nodes being ordered lists of Tasks today as a bridge to a Phase graph with nodes as single Tasks - possibly transformed by wrappers.
      
  10. src/python/pants/engine/round_engine.py (Diff revision 1)
     
     
    Iterating over a collection that you're mutating is bug-prone. Maybe better to let update_phases return the new phases?
  11. src/python/pants/engine/round_engine.py (Diff revision 1)
     
     
    This will update for all products ever seen, not just for new products, as the signature of update_phases implies.
    
    There's no harm in the end because update_phases won't add the same phase twice, but it's nice to be precise.
  12. 
      
IT
PA
  1. 
      
  2. I'm a little confused by this being the product type for codegen, given that codegen generally just injects synthetic targets into the graph.  So its type is kind of "targets by source", though I don't know if that's a meaningful concept.  Could you elaborate in a comment why this is the type, given that codegen is a bit tricky already?
  3. I really like that we're starting to separate out the preparation phases of goals into meaningful pieces.
  4. src/python/pants/engine/round_engine.py (Diff revision 2)
     
     
    I might be in the minority here, but I'm not a big fan of nested classes that aren't trivial (like exception aliases).  I feel like it makes the methods difficult to distinguish from the outer class's methods, and doesn't particularly buy us much versus just defining them at the top level in the module.
    
    Very minor style nit though.
    1. I usually make a class an inner class if its not being used without the companion class its in - so to me, it completely makes sense to have this as its own citizen.
  5. src/python/pants/engine/round_engine.py (Diff revision 2)
     
     
    This return can't be hit
  6. src/python/pants/engine/round_engine.py (Diff revision 2)
     
     
    .format please
  7. 
      
BE
  1. 
      
  2. src/python/pants/engine/round_engine.py (Diff revision 2)
     
     
    We rely on modifying a mutable argument. That's not awful but it's a bit unintuitive so document it well!
    
    Also, the round_manager knows the context, so there's no need to pass both around everywhere. 
    
    But even better, these classmethods could be instance methods and get the context and the round manager from the instance. 
    
    In fact, could they be methods *on* roundmanager?
    
    One way or another, chains of static methods that call each other with non-trivial argument lists are a code smell, implying that they probably should be instance methods.
    1. That arg should not be mutated anymore.
      
      I have moved the RoundManger out, on Patrick's suggestion and the methods with it. Thanks for the suggestion.
      Although, I didnt want RoundEngine to have any state, hence the passing of the round_manager after being created. lmk if you dont agree.
  3. src/python/pants/engine/round_engine.py (Diff revision 2)
     
     
    This return value isn't used. 
    1. it should have been, since I removed the mutating of the argument.
  4. src/python/pants/engine/round_engine.py (Diff revision 2)
     
     
    Probably good to set outer_lock_holder to None here, so we don't release lock again in the finally block.
  5. 
      
JS
  1. 
      
  2. src/python/pants/engine/round_engine.py (Diff revision 2)
     
     
    check 1st, construct the round_manager after
  3. Now that the actions are all in a specified order, 1 list should be used to collect them.
    
    The result should be:
    construct_action('bootstrap-jvm-tools')
    prepare_action('bootstrap-jvm-tools')
    construct_action('thrift')
    prepare_action('thrift')
    ...
    
    then the execute actions in expected order.
  4. 
      
IT
JS
  1. I only see 1 big issue - JvmBinaryTask _should_ be blowing up.  Fixup that and triple check the prepare request vs execute uses - I think I found all the misses remaining, but... and this should be ready to go.
  2. src/python/pants/engine/round_engine.py (Diff revisions 2 - 3)
     
     
    I missed this before, I think context.acquire_lock() and the other 2 lines below at the same indent level all need a +1 - they have only a +1 indent now.
  3. And this is safe because all product.require* are moved out of constructors now - that's great!
    
    
  4. 'missing_intransitive_deps' is asked for below as well.
    1. this is being populated in execute although there was never a require binding on it - in fact, I dont see it being used anywhere at all, hmmm.
  5. Its slighty odd to have prepare ordered before init, generally constructor then instance methods
  6. self.product_type() was used on the LHS and that defaulted to self.__class__.__name__ - so a non-empty string unique to the Task subclass.  I think you can just inline self.__class__.__name__ for suffix_type and then all tasks get an unambiguous single invalidator dir.  Right now the '' is degenerate and 2 tasks could also become degenerate if they happen to produce the same product_type.
  7. Kill - this task doesn't actually populate a 'benchmark-run' product fwict.
  8. This has product_types of ['jars', 'javadoc_jars', 'source_jars']
  9. javadoc_jars too
  10. It would be good to keep the same grouping as in the constructor - the comment goes with the class_by_*
  11. This will need the utf-8 header line of a unit test should fail that checks for this.
  12. I think you need a 1-1 mapping here from RoundManager call to products call.
    schedule -> products.require
    require_data -> products.require_data
    
    That said - its probably a good time to align the names and s/schedule/require/
    
    Also, you need to reflect the underlying signatures too; for example: JvmBinaryTask passes the optional predicate: https://github.com/pantsbuild/pants/blob/master/src/python/pants/backend/jvm/tasks/jvm_binary_task.py#L36 
  13. _get_phases_by_product - this looks like its private
  14. pydoc - the ordering is key here and it would be good to document what it is.
  15. 
      
IT
JS
  1. 
      
  2. This is missing a call to super and JarTask has some prep to do too.  This problem is semi-widespread fwict.
  3. 
      
IT
JS
  1. This is looking plausible to me.  I'll `rbt patch` your next diff and work it hard and I think it might just be gold.
  2. You don't actually provide this or have people hack depending on it yet, so I'd kill
  3. 
      
IT
ZU
  1. I was just looking over the spreadsheet  https://docs.google.com/spreadsheets/d/1R7-SqcKrQt5407pJL5KKgV_Ng2ZWB12ocJWuoCE6Wy4/edit#gid=0. 
    
    I just wanted you to be aware that we are refactoring pants goal idea and at this point have added a consumer of 'ivy_jar_products' in ide_gen.py 
    
    https://github.com/pantsbuild/pants/pull/283
  2. 
      
IT
JS
  1. 
      
  2. Kill these prints
  3. I think the return code is all you need.  It's which's job to check the exists & is executable part.
  4. 
      
JS
  1. This is still a problem though:
    $ PANTS_DEV=1 ./pants goal clean-all killserver server
    *** Running pants in dev mode from /home/jsirois/dev/3rdparty/pantsbuild-pants/src/python/pants/bin/pants_exe.py ***
    (using pantsrc expansion: pants goal clean-all killserver server --server-open)
    Launching server with pid 30208 at http://localhost:43116
    Killed server with pid 30208 at http://localhost:43116
    Created new window in existing browser session.
    
    It would be good to fix or circle back real fast and fix + add unit for unconstrained CLI phases maintaining CLI order.
  2. 
      
IT
Review request changed

Status: Closed (submitted)

Loading...