[engine] Implement BUILD file parsing inside the engine

Review Request #3377 — Created Jan. 26, 2016 and submitted

2840, 2865
benjyw, ity, jsirois, nhoward_tw, patricklaw, peiyu

Rather than having separate build and product graphs, there is now only one graph. All invalidation of changed values can happen directly in the ProductGraph via invalidation of Nodes that represent filesystem operations.

Additionally, because all file parsing (BUILD or otherwise) happens via the same mechanism, this review also includes a very simple example of dependency inference for scala code (based on looking up imported packages in configured source roots). A series of tasks look for inferred_scala Sources, and parse/infer/request their deps to convert them to scala Sources.

  • Replace the Graph class with tasks that compute Structs by parsing Directories
    • The resolve_unhydrated_struct function produces an UnhydratedStruct object, which declares its additional inline Addressable dependencies. The hydrate_struct function then takes the UnhydratedStruct and its dependencies to hydrate it into a complete value that can be validated.
    • Supports all mergeability/inlineability features of the current implementation (inline Jars, etc), but is much lazier than the current graph in that dependencies of embedded Structs are not walked unless requested by a task.
  • Implement cycle detection in ProductGraph to replace the implementation in Graph
  • Formalize Parsers and SymbolTables as top-level classes, which avoids having to pickle them while multi-processing. Instead, they are passed directly as class references.
  • Remove caching from AddressMapper, since the product graph nodes are normalized to act as their own in-process cache.
  • Add SelectProjection, which allows a Task to request a Product for one field of an input Subject.
    • This allows for de-duplicating within the graph. For example, the hydrate_struct task requests an AddressFamily for a Directory projected from the spec_path field of an Address, which causes AddressFamilies to be cached keyed by their Directory, rather than by an Address.
  • Killed NativeNode in favor of immediately satisfying a SelectNode with the subject value when possible.
  • Add an example of dependency inference for scala code via the extract_scala_imports/reify_scala_sources family of tasks.
    • See the attached dependency-inference.svg rendering for an example.


Loading file attachments...

  • 0
  • 0
  • 7
  • 4
  • 11
Description From Last Updated
  2. What about None? Isn't there 5?

  3. this implies that values can contain '='. Maybe we should prevent it?

  4. This error case needs to be updated as address is no longer an in-scope variable.

  5. I like that this change is pushing the variant extraction into the same process that parses out the variant component of the target name.

  6. partition implies that target names can contain further '@' symbols in their variants. Do we validate that separately?

    1. To me, partition is almost always a safer and faster option. Even if we were going to validate that the variants did not contain @ signs, we'd probably still want to use partition in order to avoid creating an intermediate list.

  7. src/python/pants/engine/exp/graph.py (Diff revision 2)

    I feel like this should note what the address and dependencies are used for too.

  8. src/python/pants/engine/exp/graph.py (Diff revision 2)

    Do you think you could extract the mutual recursion used here and in hydrate into helper methods? It's rather duplicated and a little hard to follow.

    1. I couldn't think of a way, although I tried. Will leave a TODO.

  9. src/python/pants/engine/exp/graph.py (Diff revision 2)

    Is it guaranteed that dependencies here represents the same list of dependencies as unhydrated_struct.dependencies, in the same order, but satisfied?

    If not, I'm not sure that the assumptions that maybe_consume makes will always hold.

    1. It does. Will comment.

  10. src/python/pants/engine/exp/graph.py (Diff revision 2)

    I'm not familiar with the idiom of putting fields on functions in python. Could you point me to some docs on common usages. I did some googling, but I find it a little confusing.

    1. So did I. But this function complained when I didn't capture it's argument somewhere:

       UnboundLocalError: local variable 'idx' referenced before assignment

      Will comment.

  11. src/python/pants/engine/exp/graph.py (Diff revision 2)

    If args is an empty dict, should it still be added to, or is it not intended to be used as a collector? If so, I think this needs a None check.

    1. Not supposed to be a collector. But to fix the accidental collector usecase, I'd have to copy the input, which doesn't seem worth it.

  12. src/python/pants/engine/exp/mapper.py (Diff revision 2)

    Update docstring please.

  13. src/python/pants/engine/exp/mapper.py (Diff revision 2)

    maybe update the repr for objects_by_name= to len(objects_by_name)= to make it more clear?

    1. Had it return the address list instead.

  14. src/python/pants/engine/exp/mapper.py (Diff revision 2)

    Update docstring

  15. src/python/pants/engine/exp/parsers.py (Diff revision 2)

    nit: symbol_table's no longer optional.

  16. Does adding a particular variant here mean that elements below that do things with the variants behavior should change?

    I guess it's not clear to me what should happen if both variants and variant have values.

  17. I think if variant is not None, this could be reduced to ...self.product, [self.variant]). Would that reduce the number of task nodes inspected? Since if variant isn't None, select_literal will only pick candidates that have a matching configuration.

    1. IIRC, we're not differentiating between "Native" and non-native sources anymore, so a variant value will be checked regardless of whether a task produces a product, or it came literally from a subject.

  18. seems like this could be memoized.

  19. Does this also mean that if the dep_product requests variants that are not the subjects, that those are added to the ones requested from dependencies?

    Do we need to request all the variants supported by the dep_product even the ones the subject doesn't require? Or could we just request the ones the subject does require.

    1. In this change (or the previous one?), variants moved off of targets, and onto StructWithDeps (because having default variants for something without deps wouldn't make sense). So the dep_product is the appropriate source of these.

  20. How about a !r?

  21. nit: update docstring as it doesn't simplify anymore.

  22. Should this re-use the validate_node code?

  23. the result of this sum doesn't appear to be used.

    1. This was to trigger CycleError in walk, but it was removed in diff 3 by eagerly checking the graph for cycles as dependencies are added.

  1. first pass, mostly questions

  2. is variants concatenated by ':'? don't see in the doc or examples, that's my guess

  3. mind document this now it's a public method?

  4. what's a non-literal variant example?

  5. is this the place your next review going to expand?

    1. never mind, saw the demo code

  6. s/task list/tasks/, implies ordering, but order doesn't really matter from what i can tell.

  1. understand at the high level nowf

  2. nice!

  3. src/python/pants/engine/exp/graph.py (Diff revision 7)


  4. remove trailing ws or fix doc

Review request changed

Status: Closed (submitted)

Change Summary:

Merged --tbr as 02aad68bd3ee98e864bb22c1fba0b6493bad6b61