[engine] Convert all isinstance product checks to using Exactly type constraints

Review Request #4236 — Created Sept. 13, 2016 and submitted

nhoward_tw
pants
3868
4233
pants-reviews
kwlzn, stuhood, yujiec

The existing type checking required coercion rules in order to ensure that product types that are produced by rules that declare they produce their super class would work.

This patch removes the sub/superclassing in favor of using the Exactly type constraint for type checking.

To do so, it adds a constraint method on SymbolTables that returns a type constraint defining what types can come out of using the symbol table. That constraint is then used in rule declarations that either produce or consume parsed products.

Other elements
- TaskNode now contains the rule it represents. That way it can use the rule components without recreating them and its repr now contains the rule's repr.
- collect_item_of_type is inlined back into SelectNode#select_literal. Removing the coercion rule class made extracting this unnecessary
- settled on input_selectors as the name for the collection of selectors used by a rule
- removed super / sub type checking in rule validation as it is no longer necessary
- renamed and moved intrinsic node factories to the rule module

I've been testing with the engine tests locally. Additionally I've been profiling running ./pants list :: with the engine enabled. It's slightly slower right now, but the removal of the coercion rules reduced the number of nodes in the graph by 5%. With further optimization of scheduling this enables, I think we can get to a better place.

(master)$ for i in 1 2 3; do ./pants --enable-v2-engine list :: -ldebug 2>&1 | grep 'scheduling iterations'; done
DEBUG] ran 152 scheduling iterations, 21135 runnables, and 62902 steps in 6.560897 seconds. there are 39158 total nodes.
DEBUG] ran 2 scheduling iterations, 2 runnables, and 23 steps in 0.001659 seconds. there are 39173 total nodes.
DEBUG] ran 0 scheduling iterations, 0 runnables, and 0 steps in 0.000002 seconds. there are 39173 total nodes.
DEBUG] ran 152 scheduling iterations, 21135 runnables, and 62902 steps in 6.554824 seconds. there are 39158 total nodes.
DEBUG] ran 2 scheduling iterations, 2 runnables, and 23 steps in 0.001486 seconds. there are 39173 total nodes.
DEBUG] ran 0 scheduling iterations, 0 runnables, and 0 steps in 0.000001 seconds. there are 39173 total nodes.
DEBUG] ran 152 scheduling iterations, 21135 runnables, and 62902 steps in 6.748163 seconds. there are 39158 total nodes.
DEBUG] ran 2 scheduling iterations, 2 runnables, and 23 steps in 0.001648 seconds. there are 39173 total nodes.
DEBUG] ran 0 scheduling iterations, 0 runnables, and 0 steps in 0.000002 seconds. there are 39173 total nodes.
$ git co nhoward/typechecks_for_everyone 
Switched to branch 'nhoward/typechecks_for_everyone'
Your branch is up-to-date with 'origin/nhoward/typechecks_for_everyone'.
(nhoward/typechecks_for_everyone)$ for i in 1 2 3; do ./pants --enable-v2-engine list :: -ldebug 2>&1 | grep 'scheduling iterations'; done
DEBUG] ran 151 scheduling iterations, 19791 runnables, and 60214 steps in 7.202758 seconds. there are 37814 total nodes.
DEBUG] ran 2 scheduling iterations, 2 runnables, and 23 steps in 0.001512 seconds. there are 37829 total nodes.
DEBUG] ran 0 scheduling iterations, 0 runnables, and 0 steps in 0.000002 seconds. there are 37829 total nodes.
DEBUG] ran 151 scheduling iterations, 19791 runnables, and 60214 steps in 7.058010 seconds. there are 37814 total nodes.
DEBUG] ran 2 scheduling iterations, 2 runnables, and 23 steps in 0.001441 seconds. there are 37829 total nodes.
DEBUG] ran 0 scheduling iterations, 0 runnables, and 0 steps in 0.000002 seconds. there are 37829 total nodes.
DEBUG] ran 151 scheduling iterations, 19791 runnables, and 60214 steps in 7.324950 seconds. there are 37814 total nodes.
DEBUG] ran 2 scheduling iterations, 2 runnables, and 23 steps in 0.001683 seconds. there are 37829 total nodes.
DEBUG] ran 0 scheduling iterations, 0 runnables, and 0 steps in 0.000004 seconds. there are 37829 total nodes.
  • 1
  • 0
  • 1
  • 0
  • 2
Description From Last Updated
This use of Exactly feels awkward... sorry, know you asked for my comments yesterday. I'm not usually one to use ... ST stuhood
ST
  1. 
      
  2. src/python/pants/engine/legacy/graph.py (Diff revision 1)
     
     

    This is insanely awesome.

  3. src/python/pants/engine/nodes.py (Diff revision 1)
     
     

    Update to reflect usage of the rule instead.

  4. src/python/pants/engine/rules.py (Diff revision 1)
     
     
     

    No longer really a marker.

    Should probably add AbstractClass here to actually have the @abstractproperty decorators enforced.

  5. src/python/pants/engine/rules.py (Diff revision 1)
     
     

    Why would this be the case? Should this be a TODO instead?

    1. The index contains both the type constraint and by each of the types in the type constraint. It does that so tasks requesting a type that's in a produced union can just do a single lookup. The consequence though, is that serialized_tasks contains the same Rule in multiple buckets.

      If I passed something else to the validator instead of the index, I think I could do away with it.

  6. src/python/pants/engine/rules.py (Diff revision 1)
     
     
     
     

    This use of Exactly feels awkward... sorry, know you asked for my comments yesterday.

    I'm not usually one to use the word "pythonic", but I think that tuple-of-types to represent a union might be more pythonic. In particular: isinstance and issubclass both accept a tuple of types to represent unions, type(..) accepts a tuple of supertypes, there is symmetry between typ in (type1, type2) and typ is type1, and etc.


    This issue isn't a blocker: just want a little bit more consideration here before you commit.

    1. Hm. I think this flattening in particular could be simplified since I'm now assuming all type constraints must be Exactly.

      But to your larger point, not using tuples not being pythonic is intentional because I don't want these bags of types to be used in isinstance or issubclass calls. Keeping that more awkward is beneficial to the goal of not leaning on python's type system. And, if we do end up introducing subclass relations again, I want to be sure they are strongly called out as such, which the TypeConstraint classes help with.

    2. And, if we do end up introducing subclass relations again

      I'm pretty certain that once they're out we will not want them back. All that I think we want now (and for the forseeable future) is union... and I think that we would want to give exponentially more consideration to allowing any other type of typeconstraint instance here, so use of Exactly feels like an odd duck.

  7. src/python/pants/engine/selectors.py (Diff revision 1)
     
     

    TODO: finish this tho

    1. Knew there had to be at least one of these in there. Good catch.

  8. src/python/pants/engine/selectors.py (Diff revision 1)
     
     
     

    You've used this pattern of memoizing constructable data a few places here, and I don't really love it... would rather see it computed as an @property of the selector unless there is some evidence that it is expensive to compute.

    1. Hm. Well, I want this to be in the initializer of the class, but there's no initializer. I'll try the @property route and see where that goes.

  9. 
      
NH
ST
  1. I continue to think the dropping Exactly would be a good idea, but won't block this landing. Can revisit later if need be.

  2. src/python/pants/engine/selectors.py (Diff revisions 1 - 2)
     
     

    Recommend droppping @memoized, as it is likely to be more expensive than the object creation here.

    And as a bit of FUD, I think that this will likely have implications on the pickled identity of the object.

  3. 
      
NH
NH
Review request changed

Status: Closed (submitted)

Change Summary:

Submitted as https://github.com/pantsbuild/pants/commit/c96d44b85e41fdb14c6495021ef06b4dd26256a1
Loading...