Refactor Intermediate Target Generation Logic

Review Request #4277 — Created Oct. 3, 2016 and submitted

yujiec
pants
3917
pants-reviews
gmalmquist, peiyu, stuhood

The current intermediate target logic uses integer index to tag target name. This has 2 major problems:
1. This creates output difference between v1 and v2 engine. In v1 engine, parse_context is created for each BUILD file. Thus the index will be initialized to 0 for each BUILD file. But in v2 engine, only 1 parse_context will ever be created, which means index will only be initialized to 0 at the beginning and keep growing afterwards. This causes output intermediate target names being different between v1 and v2 engine.
2. The current logic will create a new target every time it sees "scoped, intransitive, or provided" field, even though they may refer to the same target, which leads to redundant intermediate targets.

This review does:
1. Extract a common base class IntermediateTargetFactory. IntransitiveDependencyFactory, ProvidedDependencyFactory and ScopedDependencyFactory all subclass IntermediateTargetFactory.
2. The base class holds a class variable which is a set that holds the tuples ("intermediate target name", "relpath to BUILD file"). This is to make sure a unique target is created only once in each dir (The reason we choose dir but not the entire repo is that v2 engine has undeterminded execution order, thus the intermediate target may be created in non-determined dirs if we want to enforce this uniqueness in repo level).
3. Instead of using index, hash target's full adress and its scope to generate an identifier.
4. Add some integration tests

https://travis-ci.org/pantsbuild/pants/builds/165646469

  • 0
  • 0
  • 3
  • 0
  • 3
Description From Last Updated
  1. Thanks Yujie!

  2. Part of the reason this doc string is so expansive is that it is rendered in the docs: find in page for scoped here: http://www.pantsbuild.org/build_dictionary.html

    Should ensure that this still renders correctly after this change by doing a test publish.

    1. Looks like the doc string in base class will not be rendered. I will move it back to derived class.

  3. It looks like only the superclass uses this, so it can be "private": def _intermediate_targets.

    Better yet, if you just define _targets on the superclass, you might not need an accessor.

    1. done. As I mentioned below, the side effect of doing so all subclass will share the _targets variable, but it should be fine.

    2. Ah, true. So yea, this is similar to static fields on JVM classes. My bad.

  4. You can just define the class variable in the superclass instead.

    1. Hmm. Previously each subclass has its own dictionary. If I define the variable in superclass since dictionary is mutable, all subclass will share the same dictionary.

      I believe that should be OK. One issue I can think of is:

      intransitive(':a') and provided(':a'), if defined in the same dir, will have the same address for intermediate targets. If each subclass has its own dict, then a "target defined more than once" error will be thrown. If all subclasses share 1 dict, then 1 intermediate target will be ignored.

      Maybe I should still treat that as an error case.

    2. Had a second thought. I think I will change the suffix of "provided" from "intransitive" to "provided".

  5. It would be better not to have the subclasses override a concrete method to call the superclass... instead, would rename this method to something more descriptive, and leave the def __call__ method abstract here.

  6. scope_str is a bit confusingly named... to make it more clear that the parameter is completely generic, maybe call it suffix_str or something.

    Also, intransitive probably doesn't make sense as a default... could probably just make this a required parameter.

  7. 
      
  1. In v2 engine the execution order is undetermined. If 2 targets with different spec_paths have same intermediate dependency, then the intermediate target can be created under either spec paths, depending which spec path is visited first.

    Rather than the _targets map allowing for hits for different directories, it should probably include the requesting rel_path in the dict key. That would mean a particular target would be created once per directory, which would be deterministic and still faster than the current situation where it can be created multiple times per directory.

    1. done. Use set instead since dict is unnecessary.

  2. 
      
  1. Ship It!
  2. 
      
  1. 
      
  2. Could you leave a comment linking where hash_target is computed in the v1/v2 engine? so in case the other part changes/test break, folks would know where to look at. Or if possible, extract out the equivalent hash_target function in the engine then use it directly.

    1. hash logic is in IntermediateTargetFactoryBase._create_intermediate_target. I think it is too trivial to be extracted to be a separate public method, but I will add a comment about where it is defined.

  3. 
      
  1. Does this work if you have something like provided(':foobar') in both a BUILD.foo and a BUILD.bar in the same directory?

    1. Yes, only 1 target will be created in this case.

  2. I realize that you just copied this from my code, but perhaps this should be a subclass of TargetDefinitionException or similar?

  3. 
      
  1. Looks good, I like this change.

  2. Kind of jarring class name considering bundle isn't directly referenced in the rest of the diff

    1. Ha good catch! I was copying the header from another file and forgot to change class name.

  3. I think that this too should call the hashing function from the factory.

    1. will do as below.

  4. I will not raise an issue since you have already heard this feedback but I also favor consuming the hash algorithm from the engine.

    This patch duplicates the hashing logic in three different places.

    The logic is trivial, but you could imagine it changing, and then these tests would break.

    1. ok. I will take Yi's suggestion to extract a common method for hashing logic.

  5. How about just validating the membership of the hashed targets with assertIn?
    As is, these tests would fail if a bugfix + integration test added a new target.

    This same feedback goes for all the tests in this file, the concepts being tested would be clearer once you delete all the unreferenced targets from those sets.

  6. 
      
  1. lgtm, thanks for tackling this issue! two quick thoughts.

  2. could this be made to not persist these at the global/class level?

    if not, could this class at least expose a @classmethod to clear this state, which could then get wired up to pants.pantsd.util.clean_global_runtime_state() for global state resetting between daemon runs?

    1. Talked with Kris offline, add a reset for now. Left a TODO.

  3. I think with this change we should convert https://github.com/pantsbuild/pants/blob/master/tests/python/pants_test/engine/legacy/test_list_integration.py#L30 to comparing against the full :: vs just 3rdparty:: now.

  4. 
      
Review request changed

Status: Closed (submitted)

Change Summary:

Merged as https://github.com/pantsbuild/pants/commit/924b78c5a75ef2e0fd7a3640f8c2ec82c9c9f657.
Thanks Stu, Yi, Garrett, Mateo and Kris!

Loading...