Add baseline functionality for engine experiments.

Review Request #2914 — Created Oct. 1, 2015 and submitted

2283, 2308
benjyw, ity, kwlzn, patricklaw, stuhood
This introduces a new Addressable/Target parsing system based on named
Serializable objects and Addressed pointers.  A new Graph
implementation is built on these and one example graph is included in
three serialized forms.

A planner can now be built that consumes example graphs and outputs
execution plan graphs.

 src/python/pants/base/                                                   |   2 +-
 src/python/pants/engine/exp/BUILD                                                  |  61 +++++++++++
 src/python/pants/engine/exp/                                            |   0
 src/python/pants/engine/exp/                                         |  61 +++++++++++
 src/python/pants/engine/exp/                                               | 112 ++++++++++++++++++++
 src/python/pants/engine/exp/                                              | 155 ++++++++++++++++++++++++++++
 src/python/pants/engine/exp/                                             | 248 ++++++++++++++++++++++++++++++++++++++++++++
 src/python/pants/engine/exp/                                        |  36 +++++++
 src/python/pants/engine/exp/                                             | 126 +++++++++++++++++++++++
 tests/python/pants_test/engine/exp/BUILD                                           |  21 ++++
 tests/python/pants_test/engine/exp/                                     |   0
 tests/python/pants_test/engine/exp/examples/graph_test/self_contained.BUILD        |  51 +++++++++
 tests/python/pants_test/engine/exp/examples/graph_test/self_contained.BUILD.json   |  59 +++++++++++
 tests/python/pants_test/engine/exp/examples/graph_test/self_contained.BUILD.python |  46 +++++++++
 tests/python/pants_test/engine/exp/                                   |  68 ++++++++++++
 tests/python/pants_test/engine/exp/                                 | 280 ++++++++++++++++++++++++++++++++++++++++++++++++++
 16 files changed, 1325 insertions(+), 1 deletion(-)
CI went green here:
  1. NB: This is just a basis for writing example input graphs and Planners (new concept) that can process the graphs to produce Task plans that are combined into a schedule by a Scheduler (new concept - like an Engine front-end).
    Its more polished than it needs to be - pydoc, extra parsers - but these things may grow so I spent a bit of extra time - just don't be deceived!  The idea is to move fast and light through some experimental code from here.
  1. Submitted TBR @
Review request changed

Status: Closed (submitted)

  1. Looks great, for what it is. I still fear that this is too complicated. For a large graph this will involve creating a lot of wrapper objects.

    How much simpler can we make this if we get rid of the notion of types entirely? I.e., the build graph consists of vertices all of a single type (Target) and edges of a single type (Address), and we handle Config specially in some way. I know I'm not seeing a lot of detail here, but I would really really love to see the build graph implementation be dumb and simple.

    I just think that making build graphs really quick to parse and trivial to reason about is a laudable goal.

    1. Well - there are ~no types.  Config is a dict.  The whole graph runs on that.  You can extend to add an official dependencies field whose strings should be interpreted as edges (as opposed to ther string fields that should not be!.
      I tried to and thought  delivered _exactly_ what you describe so I'm slightly confused.
    2. Well, I'm probably misreading the code, but I guess I'm wondering about the need for Addressable, Addressed and Serializable.

      The straw man alternative in my mind is that the in-memory build file representation is literally just a vanilla JSON object, and we have a global map of addresses to these objects. Clearly I'm missing something, but I'm not sure what.

    3. Which fields define edges?  Do we say there is a magic set of fields ('dependencies' + 'configurations') where we interpret all contained string values as addresses?
      One thing to keep in mind is that the edge wrappers - Addressed instances - are transient.  The math is as follows - Taking the current Target implementation as the goo Target class, all using the json implementation for this example:
      1. On parse of a top-level target json object, a target wrapper is constructed containing 1 dict, N [Addressables: 1 string (spec) + 1 TypeConstaint: 1 type field] - so the N there is you're memory blow-up, not sure the bytes
      2. On resolve, all Addressed in the resolved's closure are turned into Targets with 0 Addressed overhead, the addresses are all resolved.
      Now its true that step 2 currently does not release references to the initial parsed form of the Targets with Addressed edge containers, but that could be arranged.
      Since the main point of this exercise for me in the ~now are the planners and the scheduler, I'd like to turn over perf testing to you or others if you actually want to drill in on the viability of this parse setup as one we actually use outside this experiment going forward.
    4. I guess the punchline here is the 2-step parse.  In step 1 - when the BUILD file is parsed in whatever format, Heavier objects with no resolved edges are constructed - Addressed instances standing in for each string edge.  In step 2, which happens in Graph.resolve, the Addressed are thrown away leaving a pure graph of 1-field wrapper objects, a Config (subclass - Target), with its single _kwargs dict field.
    5. Yeah, I'm just asking questions, don't make any changes. I will look at perf at some point, or maybe cobble together what I think is a viable alternative. That exercise should make your thinking clearer to me, and my thinking clearer to you.

    6. Here's a change that lifts the Addressed metadata from individual values up to a class-level schema built from @property-like decorators.  It allows for inlined or shallow resolvable objects in both `Graph.resolve` and `parsers.encode_json`:
  2. src/python/pants/engine/exp/BUILD (Diff revision 1)

    Do we really need all these micro-targets, one per source file? I'd like us to get away from this as much as possible.

    1. I really really like them right now using this 1 flat experimental dir - its serves as a quick check on sanity.  If you looks at the BUILD file on HEAD for example, a re-assuring thing can be seen:
      The Graph has no dependency on Config (now Configuration) or Target.  It only depends on things with _asdict + a 'name' field in that dict.
  3. src/python/pants/engine/exp/ (Diff revision 1)


    At Foursquare we have a #bikeshed Slack channel for such discussions.

  4. ooks

  1. apologies, for taking so long to get to this -- thanks for sending this through.

    looks good to me for what it is!