[engine] Daemon cacheable `changed`.

Review Request #4207 — Created Sept. 6, 2016 and submitted

3773, 3816
jsirois, nhoward_tw, patricklaw, peiyu, stuhood, yujiec
  • Implement AscendantAddresses for upwards path target traversal supporting changed.
  • Implement a new Changed subsystem for global parameters and functionality across all tasks.
  • Add inline, dual-pathed 'alternate target roots' mechanism for applying --changed functionality to any goal to rewrite its target roots in the v2 path (e.g. ./pants --enable-v2-engine --changed-parent=master list, ./pants --enable-v2-engine --changed-parent=master test). Intentionally avoid removal of alternate_target_roots at the Task level as Foursquare et al currently rely on this.
  • Kill ChangedFileTaskMixin in favor of the above.
  • Implement EngineCmdLineSpecParser, EngineSourceMapper, EngineChangeCalculator and friends for pre-BuildGraph target inspection and traversal.
  • Integration test coverage for v1.changed == v2.changed == v2.list.
  • Implement a new Engine.product_request() method for simplified, inline engine product requests.
  • Make BuildFile._is_buildfile_name public.
  • Eliminate checking of DeprecationWarning warnings in the pantsd integration test.
  • Add a new PantsRunIntegrationTest.add_test() class method for simplified test testcase generation.

This currently leaves the ./pants --enable-v2-engine list breakage as a TODO'd followup due to review size, which masks further test coverage for things like target aliases etc. Planning to attack that as a follow-on.


  • 0
  • 0
  • 13
  • 3
  • 16
Description From Last Updated
  2. can move this to a mixin so BaseTest can also benefit?

  1. I'm a little nervous about the shadowed options scopes because I'm pretty sure we don't have tests for that right now. One of the issues I saw when playing around with the patch is that we don't merge the shadowed parsers. This causes the task parser gets overridden by the subsystem parser in the hierarchy. It looks like the only effect is that the task description no longer shows up in help, but there might be others.

    Additionally, it adds some ambiguity around the value hierarchy for options for the ChangedTargetTask subclasses.

    eg --compile-changed-changes-since vs --changed-changes-since

    Could you add some tests around how these interact?

  2. src/python/pants/core_tasks/what_changed.py (Diff revision 1)

    I don't think this passthrough does anything. I could be wrong though.

    I'm pretty sure that options here will either a) always have no value for the listed options, or b) will have the same value as the subsystem.

    In the a) case, it'd be because WhatChanged was registered under a different scope than changed. For b), if the scopes are shadowed then they should be the same instance of the options container and so have the same values.

    1. during runtime, you're 100% correct - the real options always come from the subsystem (the options.get(key) always results in None and uses the default value). at test time tho (as the comment alludes), these passthroughs allow the existing test mechanisms to pass mock options more naturally for various test parameters while letting others fall through to the defaults (which now live on the subsystem). this makes the existing tests compatible with the options scope move without added modification.

      I agree it's a bit weird, but I figured that not to long from now we'll probably deprecate the changed goal(s) anyway. aside from a clearer comment here, any thoughts as to how else to improve this?

    2. I'd be happy with a clearer comment. I didn't realize it was to make tests work.

    3. done.

  3. src/python/pants/engine/engine.py (Diff revision 1)

    Could you add some tests for this?
    eg, what if multiple root nodes are throws, or what the type error looks like for a noop, etc.

    1. sure, will do.

    2. refactored and added some basic test coverage for this.

  4. How about using a guard clause here?

  5. Looks like a bug.

    1. yup, fixed! fwiw, this method is only for interface compatibility and isn't actually used.

  6. This is a bit complex, are there unit tests that cover it?

    1. all of the branches here map to the target configuration selections in the generative bit of test_changed_integration.py (see the comments) - do you think I should add unit tests on top of that?

    2. If the integration tests cover it, it's ok. But unit tests would be lighter weight and have shorter runtime.

    3. these should be covered by the integration tests - if needed, I think we can circle back to improve the unit vs integration test coverage mix later.

  7. Should this use the sources_set? Or is sources expected to be unique and sources_set is just for faster set contains checking?

    1. yes to the latter (+the fact that _unique_dirs_for_sources itself uniques at the dirname level), and I suppose it couldn't hurt for the former - changed.

  8. How about if any(source_file in sources_set for source_file in target_files_iter)?

    1. much clearer - fixed.

  9. Could you add some unit tests for this?

    1. I think I'm going to go with Stu's suggestion of moving this into CmdLineSpecParser - but will add some tests along with that.

    2. Thanks.

    3. killed this in favor of a simple function - see below.

  10. src/python/pants/scm/change_calculator.py (Diff revision 1)

    Could you find a way to simplify the logic here? One thought would be to have the instance variable values wrapped in a change request, then it could be reduced to something like

    if not changed_request:
      change_request = self._req_from_subsystem_opts
    changes_since = change_request.changes_since
    diffspec = change_request.diffspec
    1. I'm not sure I see a way to do this as-is and keep ChangeCalculator general?

      there are two paths here that this awkwardly attempts to unify - 1) the BuildGraphChangeCalculator subclass used in the v1 path, which passes in the instance variables at initialization time and subsequent calls to changed_files() are parameterless; 2) the (longer-lived) EngineChangeCalculator subclass used in the v2 path, which initializes everything to none and passes a changed_request to changed_files() on each call.

      maybe it would be better to just divide this into two separate changed_files methods on their respective subclasses and kill the base class?

    2. I think that makes more sense.

    3. done.

  11. What do you think about validating the nested dict's key set here?

    The code generation here is dependent on the nested keys being exactly 'none, direct, transitive', but if a fixture were to have another one added, it would silently be ignored.

    If one were missing, there would be an error at load time with a key error, so maybe that's ok.

  12. Could you add a comment here noting that the default values for the params are used to close over the loop variables.

    It's a pattern I haven't seen before. I find it a little confusing.

    I could also see issues here in the future if we want to add more closed over variables, but forget to add them to the default value list.

    1. sure - added. oddly enough, it was a NameError for filename that arose without closing over these that led me down this path. I suspect something to do with the way these are bound to the class and then ultimately used as instance methods, but had meant to dig into this more.

    2. Hm. Interesting. It's a neat trick, since default values are evaluated and assigned at declaration time.

    3. yep!

  13. Since graph_helper is an object, I think it would be clearer to use attributes to get the scheduler / engine here instead of assuming it's a tuple with a certain order.

  14. How about switching this to a guard clause?

    1. I generally look at guard clauses as a way to avoid deeply nested blocks - but not sure I see the benefit here for the one liner in the if statement? am I missing something?

    2. In this case, I just think it'd be clearer. With a guard clause, you can say if matches, continue, rather than inverting the match and nesting the dependent assertion.

    3. fixed.

  1. Thanks Kris.

    Intentionally avoid removal of alternate_target_roots at the Target level

    Should probably say "at the Task level"?

  2. src/python/pants/base/build_file.py (Diff revision 2)

    Rather than exposing this, it would be good for the LegacyAddressMapper to use the pattern defined on src/python/pants/engine/mapper.py

    1. well, it was being used this way in LegacyAddressMapper before - I just made it "public" because that's how we were already using it.

      I don't see a particularly quick/clean way to achieve what you suggested and still afford for a globally customizable AddressMapper.build_pattern, so will punt for now and simply revert the method renaming.

  3. I don't think "v2 spec roots" are specific to the v2 engine... I'm pretty sure that when I added them I included them in the address mapping path for the v1 engine here: https://github.com/pantsbuild/pants/blob/master/src/python/pants/build_graph/address_mapper.py#L50-L52

    So this conversion process is a bit awkward. Would be great to push the conversion from strings to specs.Spec objects earlier in the process, rather than doing a conversion as late as the graph creation.

    1. when I try to do this sans the awkward conversion, I hit:

      Exception message: 'LegacyAddressMapper' object has no attribute 'specs_to_addresses'

      which implies more unimplemented coverage in LegacyAddressMapper vs BuildFileAddressMapper (the latter of which provides a resolve-entangled specs_to_addresses that doesn't readily forklift out).

      I've refactored a bit inline with the TargetRoots suggestion which certainly made this a bit cleaner, but not fully gone. how would you feel about shipping this as a stopgap w/ TODO and then circling back to cleanup once we have a more complete story around LAM<->BFAM compatibility (which I believe Yujie is working on now)?

    2. Just want to mention that specs_to_addresses is unnecessary for LegacyAddressMapper. There is only 1 usage of that and it is inside MutableBuildGraph, which should never be used together with LAM in the first place.

    3. with the above being said, how did you hit that exception?

  4. src/python/pants/bin/engine_initializer.py (Diff revision 2)

    Maybe it would be premature, but I think it might be good to extract this construction logic into a factory function for a "RootCalculator" or "TargetRoots" class, with a subclasses for Changed vs Literal, etc. Most of the logic added in this file would go in that abstract class instead, and perhaps that is what would be passed as the argument to create_build_graph.

    I think that might also replace/merge ChangedRequest?

    1. took a whack at this - let me know what you think.

  5. This is a list of captial S Specs?

    1. yes - clarified.

  6. src/python/pants/core_tasks/what_changed.py (Diff revision 2)

    Should this mention that this task is deprecated in favor of ./pants --changed-parent=X list ? And should it be explicitly deprecated?

    1. the ./pants --changed-parent=X list mode only applies to the v2 path currently, so a deprecation at this point seems premature.

      I can look into converging this in the follow-on work tho - at a minimum this would be blocked by the target alias handling bit of that.

  7. And I guess that is backwards-compatible?

    1. yup - that's the idea.

  8. src/python/pants/core_tasks/what_changed.py (Diff revision 2)

    Creating a ChangedRequest and then passing it back to the changed instance feels convoluted. Perhaps Changed.Factory.create(..) could take the override definitions?... but more generally, I'm not exactly sure what "Allow options-based passthrough" means here.

    1. yeah, this bit is definitely convoluted as-is - but essentially "options-based passthrough" here just means that the local task's options are respected before the Changed subsystems options are. at runtime, this is a no-op, but allows existing tests to run unmodified and still pass their test options in on the local task's scope.

      Changed.Factory.create(...) is much better tho - went that route.

  9. src/python/pants/engine/engine.py (Diff revision 2)

    Some validation that this contains the same number of outputs as inputs would be good.

    1. hmm, not sure I understand exactly - are you suggesting validating that len(subjects) == len(entries)? or that len(state.value) == len(entries)? or something else?

      afaict, len(subjects) == len(entries) wouldn't be guaranteed to line up because of subjects like DescendantAddresses('') etc. and maybe_list has inbuilt expected_type validation that should safeguard against unhandled list items.

  10. src/python/pants/engine/graph.py (Diff revision 2)

    Since this is a helper for one of the other methods, rather than a top-level function, probably _recursive_dirname. Alternatively, could define it inside of ascendant_addresses_to_globs.

  11. src/python/pants/engine/legacy/change_calculator.py (Diff revision 2)

    Any reason not to use @memoized_property here?

    Also, is there any benefit to creating this lazily, or would just creating it eagerly be fine?

    1. no reason that I can see - was just cloning the existing impl in BuildGraphChangeCalculator.

      switched both to eagerly set attributes.

  12. Can this just go in the superclass instead? If we can avoid subclassing just to add otherwise generic methods, that would be preferable.

  13. If this is actually going to be compared to other objects using structural equality, should probably prefer datatype to namedtuple (to avoid accidental matches against other tuples with the same shape but different types).

    1. I don't see a need to compare it, but changed it to datatype anyway.

  14. Not clear what "global changed functionality" is. Should expand to describe what that is.

  15. Should probably expand this to indicate that one of the changed options caused the SCM to be required... maybe "a --changed option was specified, but no SCM implementation is available to satisfy the request."

    1. good call, fixed.

  16. Is this supposed to be abstract? I'd prefer that anything that will be subclassed be explicitly marked abstract, as overriding concrete methods is a bad idea in general.

    Otherwise, if the other classes don't actually need to subclass this (because they only consume its public fields, and it doesn't consume their fields), should compose the objects instead.

    1. made this abstract.

  17. Pheew... not API: public

  18. src/python/pants/task/changed_target_task.py (Diff revision 2)

    This usage seems massively simpler than that of the WhatChanged task... what am I missing?

    1. the *-changed goals use the Changed.Factory options registration function to register options on their own task's scope like so:

      def register_options(cls, register):
        super(ChangedTargetTask, cls).register_options(register)

      so in this case we pass in just the options of their scope to seed the change request without needing to merge with the options of the global Changed subsystem's scope like the changed task has to do.

      we can't use the same pattern for the changed task itself is because the Changed subsystem shadows its options at the scope name level (changed == changed), so doing so would fail due to double options registration.

  2. This can be a @classmethod, maybe - see below.

  3. This is obviously highly dubious, Changed has a ChangeRequest it hands back out but doesn't use itself. Instead it requires config be passed in to change_calculator. It looks like usages just pass the options defined above back in so, mod maybe needing to change self.global_instance() into self.scoped_instance(...) I think the config argument to change_calculator can be eliminated along with the changed_request property.

    1. yeah, agreed this is a bit wonky as-is. iterated a bit here - let me know what you think now.

      also, I don't think I'm grokking the benefit or idea behind scoped_instance() in this case - afaict, this configuration should be typical of subsystems global options access as the options for changed are only ever set once per GoalRunner invocation and thus I'd assume the global instance would suffice? what would a scoped_instance() do differently?

    2. It was not a benefit suggestion, just a behavior-preserving lazy note.  IOW: I did not thoroughly analyze the scope of options coming in.  If the scopes are all global, great.  If not, then some attention would need to be paid to scoping.
  1. Kris,

    from what I saw, after this change, in a "changed" task, BuildGraphChangeCalculator will be used no matter which engine Pants is running on. is it True?

    1. at the Task/Subsystem level, yes - and that's what we want, since the BuildGraph here is the engine-backed LegacyBuildGraph.

      in the v2 future, there will be no tasks specific to changed - so in that case we'd only use the EngineChangeCalculator to compute ChangedTargetRoots and then pass those to any underlying task.

    2. ok. I was thinking we could implement "changed" task using EngineChangeCalculator. But if "changed" will be gone in v2 era, then I am fine with it!

  2. is it "dependent_address"?

  3. nit: maybe use "while True"?

    1. while True is generally slower in python2.7 than while 1 because True has to be dereferenced each loop - see http://stackoverflow.com/questions/3815359/while-1-vs-for-whiletrue-why-is-there-a-difference

      is while 1 really that difficult to grok from a readability perspective? I actually find it more readable than while True.

    2. hmm, interesting. Didn't know that difference before. Thanks for the link!
      As for the readability, I personally prefer a keyword to a numeric constant, because "while" expects a boolean type value, and True is native boolean type.

      But I am fine with "while 1", expecially given the fact it is faster.

  4. I don't know much about Subsystem, but I am curious why it is Factory that subclasses Subsystem, but not "Changed"?

    1. Subsystems are special in that they must be initialized via special machinery before being used, which makes it difficult to test them. this (cargo culted) pattern neatly compartmentalizes the Subsystem bits in the Factory and allows for easier instantiation/testing/etc of the wrapping subsystem implementation itself.

    2. I see. Thanks for the explanation!

  2. I think this means that ascendent specs will be added to the types of specs that can be passed on the command line.

    Are we planning on supporting ascendant addresses as part of the commandline? I'll have to give that some thought.

    If we do, should v1 support it, or only v2?

    Also, I'd like to see some tests added to the unit tests for this if we're going this way.

    1. hmm, good questions - imho, not worth the fuss tho. I mainly added this for logical parity with the other specialized Specs and to demonstrate its functionality, but I don't have a concrete use case for it in mind and we can always readd this later if deemed necessary.


  3. I think the way this gets used in parsing means that we'd be allowing any unnormalized spec from within BUILD files.

    1. only for resources, but that's because currently resources are represented as unnormalized specs in the engine's LegacyTarget. there's definitely room to iterate further here.

      after a second look tho, it occurs to me that this probably doesn't make much sense in CmdLineSpecParser - moved this back out into source_mapper.py as a standalone utility function.

  4. src/python/pants/bin/target_roots.py (Diff revision 5)

    Should this be called SpecRoots? It doesn't contain targets as far as I can tell.

    1. the name was taken at Stu's suggestion, but I think it makes sense and think of it as "the target(ed) specs of the pants run" vs "(capital T) Target roots". I feel like both "SpecRoots" and "TargetRoots" can have overloaded meanings depending on the context.

      there's also naming precedent for a similar construct in the old engine called TargetRootsReplacement. but ultimately, I don't feel strongly about it either way and would be open to revising if others do.

  5. src/python/pants/bin/target_roots.py (Diff revision 5)

    Assuming this is an OptionsValueContainer, the str may not be very useful as __str__ isn't defined.

    Additionally, this only shows the changed subsystem options. I think it would be more clear if the message pointed that out.

  6. src/python/pants/bin/target_roots.py (Diff revision 5)

    nit: currently, from_options will always return a changed_request, so it will be truthy.

    1. removed the check for truthy changed_request

  7. src/python/pants/engine/engine.py (Diff revision 5)

    I think Throw isn't an exception, so you can't raise it.

    Is there a test that covers this case?

    1. ah, yup you're right - refactored this a bit.

  8. src/python/pants/engine/engine.py (Diff revision 5)

    I think the only other valid state you could get here would be Noop. Should Noop be an error, or would it just mean that there are no computed products?

    1. I think receiving Noop here should be an error - made this more explicit.

  9. src/python/pants/engine/legacy/BUILD (Diff revision 5)

    nit: sort

    1. fixed. I wish we had something that would autosort these, so we didn't have to do that by hand. :(

  10. This feels like it should extracted since it's used 3x in here.

    Maybe *FilesetWithSpec could have a method that yields relrooted paths and ensures they are relrooted?

    1. mind if I punt on this one for now? I'm worried about scope creeping too much and can always iterate later.

  11. Could you add a test for a file not owned by any targets? This might cover the noop case I was talking about above, though I'd prefer a unit test.

    1. good eye. I'm planning to do this in the follow-on review, as this is one of the last remaining broken bits in the v2 engine (and half of the reason I had planned for the fast-follow review).

  2. src/python/pants/scm/subsystems/changed.py (Diff revisions 3 - 4)
    Sorry for the slow feedback.  You labelled the 3-4 interdiff as for Stu's feedback, but it looks like you hit my feedback here as well.  LGTM.
    1. no worries at all - have been spotty myself due to support/firefighting duties last week, so was firing sparse bits as possible and should've connected the dots a bit more.


  1. Ship It!
  1. Ship it! I've got a few remaining minor nits and a assertion preference.

  2. src/python/pants/bin/BUILD (Diff revision 6)

    nit: sort. +1 to the automated sort. Maybe we could add something to the plugin?

    1. fixed. I think fsq's buildgen applied to the pants' repo might be the answer here?

  3. src/python/pants/bin/BUILD (Diff revision 6)

    nit: sort

  4. src/python/pants/scm/subsystems/changed.py (Diff revision 6)

    nit: ws indent.

    Should these be split out one per line?

    1. fixed and moved the list opening down a line for improved readability.

  5. I'd prefer this assertion to cover part of the string-ified Throw's since those are the critical info that's being communicated.

  1. Ship It!
  1. Ship It!
  1. Thanks Kris!

  2. AFAICT, it is odd for this to take a rel_path, because command line specs are always relative to the buildroot, and CmdLineSpecParser is the only implementation of a "spec parser" class that I know of.

    But I may be reading more into the name 'spec_parser' than I should be? If this is not a CmdLineSpecParser, and is instead pants.build_graph.address#parse_spec then that's fine.

    It would be great to support running in other directories, but currently this seems like it is likely to be (ab)used to bypass the the spec normalization we attempt to apply in BUILD files: https://rbcommons.com/s/twitter/r/4221/

    1. essentially, I was looking for something existing that took a string spec from a BUILD file (where specs like :xyz are allowed) and returned a proper Spec. CLSP was the only existing thing that appeared to do that - but this usage indeed bends the purpose of that class a bit.

      I think I can probably just get away with pants.build_graph.address#parse_spec + SingleAddress tho. cleaned this up.

  3. src/python/pants/scm/subsystems/changed.py (Diff revisions 2 - 8)

    It still doesn't look like anything is deprecated here, so this should probably refer to a ticket at least.

  4. src/python/pants/bin/target_roots.py (Diff revision 8)

    A supremely dumb bikeshed, but: we might consider naming this AddressRoots while we have the chance. It's possible that as we lean further into v2 we'll actually be able to encourage and diff addressable objects which are not targets.

    1. ...and I immediately regret it. If anything just calling everything a Target would probably be better than trying to make the name Addressable happen.

    2. hah - yeah. this should be pretty easy to rename later if needed tho.

  5. src/python/pants/bin/target_roots.py (Diff revision 8)

    Would be good to refer to a ticket/rb here.

Review request changed

Status: Closed (submitted)

Change Summary:

thanks Peiyu, Nick, John, Yujie & Stu! this is in master @ https://github.com/pantsbuild/pants/commit/68e68ba5b5d772c6dfbd0586363da3946eec8c59