[engine] Introduce ProductGraph invalidation.

Review Request #3578 — Created March 17, 2016 and submitted

2970, 3046
peiyu, stuhood
  • Initial pass at ProductGraph invalidation using a predicate-based matching approach as the assumed coupling.
  • Add a dependents= flag to ProductGraph.walk for reusability in backwards (dependents_of()) graph walks for invalidation.
  • Add basic re-entrant locking of ProductGraph instance method calls using @wrapt.synchronized.
  • Eliminate ProductGraph.clear() in favor of ProductGraph.invalidate().
  • Test coverage.


  • 0
  • 0
  • 1
  • 0
  • 1
Description From Last Updated
  2. This synchronization is pretty fine grained... in particular, I'm really curious what the effects of invalidating in the middle of a build would be, and so I'd really kind of prefer that we go much coarser to start.

    What about having a Scheduler-level lock and acquiring it for def invalidate or for def schedule? That would make the ProductGraph concurrency-oblivious, and the scheduler concurrency aware, which feels more manageable to me.

    1. Better yet, should the concurrency be managed by the Engine? Since it is the caller of schedule, it seems like it could also proxy through the call to invalidate and apply locking.

    2. I don't see a great way of holding this lock in the Engine as it stands right now since the Scheduler is initialized outside of it - but added a Scheduler-level lock that allows providing a lock as an input (e.g. passed in from the Engine) for the future. I also added a def invalidate directly on the scheduler that proxies the call to self.product_graph.invalidate() under this lock - and also added locking for the 3 other ways that the scheduler accesses the product graph which I think is probably the best coarse grained approach currently. I'm open to further ideas here tho - but assume this will evolve as we continue refining the engine->scheduler->graph stuff.

      this does not protect against direct access/references to the scheduler.product_graph instance tho - which was essentially part of the thinking behind the finer grained locking in this first changeset.

  3. src/python/pants/engine/exp/scheduler.py (Diff revision 1)

    Doesn't seem like either the enumerate or reversed is necessary here... could return len(invalidated_nodes), right?

  4. src/python/pants/engine/exp/scheduler.py (Diff revision 1)

    Rather than doing this check inside the loop, how about selecting the list to use at the beginning of the method?

    adjacencies = self._dependents if dependents else self._dependents
    .. = _filtered_entries(adjacencies[node])
  5. So, might it make sense for the purposes of testing to have ProductGraph take a "validation" function as a constructor parameter? I think that that would allow ProductGraph to be decoupled-from/moved-out-of scheduler.py (at some point), and would make it possible to write tests with string keys, for example:

    pg = ProductGraph(validator=lambda _: pass)
    pg.update_state('A', Waiting(['B', 'C']))
    self.assertEquals(set('B', 'C'), pg.dependencies_of('A'))
    1. I apologize for not having added unit tests for ProductGraph, but this looks like a good case for the first one =/

    2. great idea - did exactly that and it feels much cleaner.

  1. Awesome, thanks!

  2. so we drop the edges only for roots and keeping for the rest nodes? might worth adding some comment

    1. the same removal happens implicitly for all the other invalidated nodes due to the walk + delete - this just severs the last backwards inbound connection to the invalidated root because of the fact we track dependents.

      e.g. for a graph like A->B->C, if we invalidate 'B' then both 'A' and 'B' get outright deleted (which kills their dependents edges altogether) - but ProductGraph._dependents still has an edge reference from 'C'->'B' (where 'C' is not deleted) that this particular bit removes. it only applies to the outer/root nodes.

      added an extra note in the comment.

  3. update comment since now we support walking both ways

    1. generalized this to dep_node_entries.

  4. document the new param

    1. good call, fixed.

  5. Very nice!

Review request changed

Status: Closed (submitted)

Change Summary:

thanks Stu & Peiyu! submitted @ c406bb70ccbceb0d4e24d90db5f5dbd21589a6d1