Make LegacyAddressMapper v2 engine backed

Review Request #4239 — Created Sept. 15, 2016 and submitted

yujiec
pants
3871, 3875
pants-reviews
jsirois, kwlzn, nhoward_tw, stuhood

When testing v2 engine in OSS pants master, I found that LegacyAddressMapper is missing a scan_build_files method which is used in v1 BuildFileAddressMapper, thus causing some test failures. In addition, LegacyAddressMapper internally relies on LegacyBuildGraph to do things. This dependency is unnecessary.

This change adds the missing method to LegacyAddressMapper, and implements all methods of LAM using v2 engine directly.

A summary of this review:
1. Implement "scan_build_files" on LegacyAddressMapper. As part of the effort, add a new selection_request method in LocalScheduler class.
2. Modify all other methods of LegacyAddressMapper to use v2 engine directly, and remove the dependency on LegacyBuildGraph.
3. Remove "resolve_spec" method in AddressMapper base class, replace with a new method called "check_valid_spec", which will check if input spec is valid or not. This suffices the only use case of resolve_spec in master without returning an unused TargetAddressable.
4. Add a new "resolve_address" method in BuildGraph, which will replace the usage of "address_mapper.resolve" in master. This is done to unify the behavior between v1 and v2 engine, since in v2 engine, only LegacyBuildGraph has instantiated target objects (ProductGraph only has TargetAdaptor).
5. Modify test cases for LegacyAddressMapper
6. Add some missing dependencies in several BUILD files.
7. Fix a bug in go_buildgen()

https://travis-ci.org/pantsbuild/pants/builds/163454699

  • 0
  • 0
  • 3
  • 0
  • 3
Description From Last Updated
  1. Looks great! Just a few things to clear up.

  2. If this is intended to validate CLI specs (which I think it is?) that should be mentioned here.

    But also, this should mention that it validates that it is an existing spec, not just something that is parseable.

    1. As said below, I don't want to limit the usage of this method to only cli args, but I will mention that it validates if a given SingleAddres spec exits or not.

  3. src/python/pants/build_graph/address_mapper.py (Diff revision 2)
     
     
     
     

    Since I think that this is only intended for CLI usage, it should likely use https://github.com/pantsbuild/pants/blob/master/src/python/pants/base/cmd_line_spec_parser.py#L55 to parse the spec. But better yet, I think that the API of the method could be clarified to indicate that it must only be called with a potentially-ambiguous SingleAddress spec object, and then validates that it does not exist.

    All CLI args should already have passed through cmd_line_spec_parser at some point.

    1. I think I will do the parsing (through cmd_line_spec_parser) in goalrunner, as this method can be used generally for any SingleAddress spec and should not be tied to cli args only. I will add a check for SingleAddress instance in this method and also specify it in docstring.

  4. This is dangerous: ValueError can be thrown for a lot of different reasons, so this is very likely to hide other errors.

  5. Nice!

    Give that the engine prefers to execute things as batches, it would probably be good for scan_build_files to take a list of directories instead (and thus for the new *_execution_request method you added to take a list of subjects as well).

    1. scan_build_files is a common method in both v1 and v2 engine, and its use cases in pants are all taking a single dir as parameter (it scans BUILD files recursivelt though), it may not be a good idea to modify all those cases to pass a single element list. What do you think? I agree this new *_execution_request method should take a list of subjects (actually a list of (subject, selector) tuples I guess) and I will make the change.

    2. Yea, that's fine.

  6. Hm... allowing a user to specify a Selector is probably a better API than we currently have for execution_request. Rather than calling this 'custom_', it would be nice to find a real name for it, and start to think about how we can deprecate the existing def execution_request API.

    Replacing the old API should go in a separate review though.

    1. Hm. Yujie, is there any particular reason why you went for SelectDependencies(BuildFiles, BuildDirs) instead of relying on the existing SelectDependencies(product, Addresses) functionality? I haven't traced through myself, but would that work?

      I think the idea of using selectors as a calling api is interesting. But, with my static analysis hat on, it could make it harder to validate reachability of rules.

    2. I think SelectDependencies(product, Addresses) will do Select(Addresses) first, and then select product from each dependency of Addresses. The thing is, BuildFiles are intermediate products of select(Addresses), if I use Addresses as subject, I can't get BuildFiles as products.

    3. I think the idea of using selectors as a calling api is interesting. But, with my static analysis hat on, it could make it harder to validate reachability of rules.

      @nick: One way I'd like to see this evolve is to lazily validate once a Selector has been provided, and to cache validations by (type(subject),Selector). That would remove the requirement to define bounded root types as well.

    4. The trade off there is that we can no longer fully validate the rule set statically because we don't know the root subject types. In that case, there's no way to be sure that any given rule is fulfillable/unfulfillable because it's possible that the missing dependency is provided by a subject that just isn't requested. The most we can say is that for the particular subject type and selector for this request, the rule is not used.

    5. True. But note that it is not actually necessary to check an input ExecutionRequest root type against the configured/known root types... it's possible to use them strictly for the pre-execution validation, and then allow any particular input root (with the understanding that it may noop).

  7. Equivalent:

    for state in result.root_products.values():
      ..
    
  8. 
      
  1. lgtm! handful of comments/questions.

  2. seems like ideally it might be better for scan_build_files() to continue to return BuildFile objects vs relpath strings and let its consumers decide how to utilize them?

    or is this intentional to support v2 compatibility where we don't have concrete BuildFile instances?

    1. Yeah, it is intentional not to use BuildFile in v2 engine.

  3. maybe rename this to is_valid_spec or spec_exists?

    1. according to this comment and Stu's comment, I think is_valid_single_address should be a good name.

  4. src/python/pants/build_graph/address_mapper.py (Diff revision 2)
     
     
     
     
     

    it's not super clear where ValueError would be coming from here? maybe a comment indicating that would be good?

    also, assuming you need that you should be able to consolidate these together using a tuple of exception types:

    try:
      ...
    except (AddressLookupError, ValueError):
      return False
    
    1. ValueError was originally from Address.parse(), but after Stu's comment above, I think I will remove Address.parse() in this method, thus I will remove this ValueError as well.

  5. nit: s/buildFiles/build_files/ + indent level on continuation

  6. src/python/pants/engine/legacy/address_mapper.py (Diff revision 2)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     

    would be great at some point - either before we both merge or after - to port this to the new Engine.product_request() method out in the changed review, which already takes care of most of this logic.

    1. Yeah, I was thinking the same thing when I read your review:).

      done.

  7. src/python/pants/engine/scheduler.py (Diff revision 2)
     
     
     

    could you find a more descriptive method name for this? maybe selection_execution_request etc?

    also, this probably deserves a docstring.

    1. I picked the name "selection_request". Does that sound OK to you?

    2. works for me!

  8. 
      
  1. 
      
  2. I think this should be an OrderedSet, since the abstract method docstring says it is one.

    Also, if it's unordered, that may introduce some problems if consumers assume ordering is consistent.

    1. I wonder if we can replace OrderedSet with set in both address mappers? Since the output from v2 engine is likely to be in different order from the output of BuildFileAddressMapper and I don't think oss pants relies on the order of the output (still need to verify). I see this method is not marked as "API public", so if I can verify no usage of ordering in oss pants maybe it's safe? Raw set is much faster than OrderedSet after all.

    2. I check the usage of scan_specs in master, it is used in filter task and also mutable build graph's inject_specs_closure method. For filter task, the output is set() type so I don't think order matters. For inject_specs_closure, I didn't see usage in master that is order related. However, this method is API public, thus I am not sure if the order is used outside of master, though the docstring didn't mention about ordering.

      Anyway my point is, v1 and v2 engine will generate different orders when running same methods. For example, v2 LegacyBuildGraph has its own implementation of inject_specs_closure and the order of the output will be different from v1 version of that method. Thus I don't see the need for keeping the order of output in these cases, especially for v2 engine. In addition, set() is faster.

      @Stu, it seems you are the author of scan_specs in build_file_address_mapper, was there a particular reason that you want to keep the order of the output?

    3. It turns out if I change OrderedSet to set in BuildFileAddressMapper.scan_specs, ci will fail ("import error" feels like caching issue). I don't know the root cause yet, but I will track it in a different ticket. I will leave a TODO there.
      As for legacyAddressMapper, as I said, the output orders are different between v1 and v2 engine, thus returning OrderedSet is unnecessary. I have updated the docstring to remove "OrderedSet" part.

  3. 
      
  1. Ship It!
  2. 
      
  1. Ship it! I've got some non-blocking comments.

  2. src/python/pants/bin/goal_runner.py (Diff revision 4)
     
     

    Previously, if a "goal" was not a valid spec, it would ignore that, and blow up in Goal.by_name. Now, the error will come out of the spec parser.

    Also, if goal is a valid spec, but doesn't parse to SingleAddress, the user will be presented with a type error.

    Not sure how likely either of those cases are, but it's something to consider.

    1. hmm, I think spec parser rarely throws an error (only when given goal is abs path), and if goal does not end with ":" or "::", a SingleAddress object will always be returned.
      "goal being abs path" and "goal ends with : or ::" theoretically should never happen because our command line argument parser won't treat them as goals, thus won't save them in options.goals.
      So I believe the above 2 cases are very unlikely to happen.

  3. It would be helpful to include the object that was the wrong type in the error.

  4. ExecutionError's are not nearly as clear as the error generated by _raise_incorrect_address_error. Maybe add a TODO with a link to https://github.com/pantsbuild/pants/issues/3912?

    I don't consider this blocking because it's difficult to get to this except right now. In cases I tried, graph construction fails earlier and results in the error message noted in the issue.

    1. I link the ticket in engine.py, product_request method, since the message comes from there.

  5. 
      
Review request changed

Status: Closed (submitted)

Change Summary:

Merged as https://github.com/pantsbuild/pants/commit/482014d98b2d02e1a7b81d93d172b9d3db2ff6e4
Thanks Stu, Kris and Nick!

Loading...