Add changes-in-diffspec option to what-changed

Review Request #1535 — Created Dec. 20, 2014 and submitted

benjyw, dturner-tw, jsirois, patricklaw

Add an option to what-changed mixin that instead of looking at changes in the workspace,
relative to some parent ref/treeish, allows finding targets changed by some diffspec.

A diffspec is something that is meaningful to the scm, eg a commit SHA or a range, or
other ref/ref-range or similar. Pants makes no attempt to parse or understand diffspecs,
but rather just passes them through to an appropriate scm command.

For git, several examples of possible diffspec formats can be found on:

Exmaple usage:
pants goal what-changed --diffspec=HEAD
pants goal what-changed --diffspec=HEAD@{yesterday}
pants goal what-changed --diffspec=abcb913
pants goal what-changed --diffspec=HEAD~8..HEAD
pants goal what-changed --diffspec=v1.0.2..HEAD

Added a bunch of tests, most of which aren't actually testing any pants code but rather just
documenting some of the git behaviors we believe work, and in the process ensuring the git
command used to get this information from git indeed performs as expected.

./pants goal test tests/python/pants_test/tasks:what_changed tests/python/pants_test/scm::

  • 0
  • 0
  • 2
  • 1
  • 3
Description From Last Updated
  2. I think None is the default default. Is that right, @benjy?

    1. Yes, no need to specify default=None. It's the default default, unless the action is 'append', in which case the default default is an empty list.

  3. If these options are mutually exclusive, we should probably raise an explicit error in the presence of both, to avoid confusion.

    1. changed-since/parent has a non-falsy default which I also expect will frequently be overridden (also non-falsy) in configs (eg we'll be setting it to origin/master at foursquare) to match our current internal impl so checking for both diffspec and changes-since being specific seems tricky?.

    2. That sentence seems to have gotten away from me. Trying again:

      I expect the changed-since param to be set in config depending on users' tastes. Example: our current impl of compile-changed at foursquare uses 'upstream/master' so we'd be putting that in pants.ini if we switched to an upstream impl that used 'HEAD' as the current what-changed does.

      I've now encoutered this a couple times -- two mutually exclusive params where I'd like to raise some sort of usage error if both are passed explicitly, but cannot distinguish between one having a default/config value from being explicitly requested. That is obviously by-design in new options scheme, but it seems like I'd need a second layer of options and None defaults to get stricter usage errors like this.

  4. src/python/pants/scm/ (Diff revision 1)

    Are these concepts meaningful in other SCMs? I feel like the Scm superclass is becoming a token abstraction that only exists in actuality for its only subclass, GitScm.

    Which is to say I'd be fine with not jamming this into the superclass, and expecting downstream users to be careful if their workspace isn't a git Scm.

    1. I believe that the concept of a ref or commit range is meaningful to other SCMs, and whatever one passes to diffspec should be meaningful to whatever underlying SCM you're using.

      Pants is not trying to abstract over this here -- we should be explicit that this is an opaque string that the underlying scm will use to return a list of changes.

    2. So yes, this is a concept applicable to other SCMs. The SVN impl would likely parse svn diff -r 245:HEAD --summarize and a "changes in X" method seems applicable to any SCM wrapper.

  1. 2 unfortunates here, I assume the driving force behind them is back-compat:
    1.) 2 diff methods on scm instead of 1 - you could imagine changed_files taking an optional to_; ie: a ..., from_, to_='HEAD'.
    2.) The issue Patrick raised in WhatChanged - similar to point 1 above, it would be nice to have a from flag (--parent) and an optional to flag

    1. I'm fine with making it one method and adding more params to control behavior, but to make the counter arg:

      I think the current method serves the "how is my workspace different than X?" use case, whereas this introduces a new "what did Y change?" use case.
      I think these two use cases have different params:
      - the first case may or may not want include untracked files. "untracked" is meaningless in the second and including them would be incorrect.
      - the paramter X in the first case really only makes sense as a single ref, but the Y param in the second case makes sense as a range (eg "What changed in ab0f123?" and "What changed in release-2014-11-04..origin/master?" are both cases we have in our current tools)

  2. src/python/pants/scm/ (Diff revision 2)

    Why diff-tree and not diff? IE: is changed_files above doing it wrong?

    1. As far as I can tell, they are the same when given two tree-ish things (eg, but differ in what they do when only given one ref:
      - git diff foo returns the current working tree's differences to foo i.e. changes since foo.
      - git diff-tree foo returns the differences between foo and foo's parent, i.e. the changes in foo.

      When foo is just one ref (upstream/master, abc123, v1.0, etc), getting the changes contained in foo using diff instead of diff-tree would involve transforming foo to the range foo^ Thus implementing "changes in diffspec" using diff would require pants parse the diffspec to see if it is a single ref or range. Git already does that off the shelf, so I figured I'd avoid re-implementing it.

  1. Instead of reimplementing a bunch of git stuff in pants, could we instead simply have pants parse a diff that's passed to it on stdin? Then pants wouldn't need to know anything at all about scms. That seems simpler and more in line with the Unix philosophy.

    1. I don't think there is substantial reimplmentation of anything in git in this change. The work of figuing out what a diffspec acutally changed is left entirely up to the scm -- the git implmementation is literally 4 lines, essentially shelling out to git diff-tree --no-commit-id --name-only -r $diffspec. My intention is that a user can pass whatever they want for diffspec, and it will be handed to the underlying scm (which will hopefully know what it means), and it should be opaque to pants what is actually in a diffspec.

      I think "parse a diff" sounds deceptively simple -- pants is best at knowing the diff format it wants to parse, and can ask the underlying SCM to produce that in a way it knows will work. Counting on the user to remember how to do that seems more likely to trip people up.

      Also, as a user, I'd much rather say ./pants compile-changed --changes-since=v0.1 or ./pants validate-graph-changes --diffspec=v1..v2^ than memorize git diff-tree --name-only -r --no-commit-id HEAD^ | ./pants. Making sure that command was right, and then writing some tests to confirm that was, is something I can do once rather than expect users to do every day.

    2. I was thinking just git diff HEAD^|./pants, and have pants parse the diff; this is maybe slightly more cpu and i/o, in that the whole diff needs to be generated, but not a ton. This also handles the unchanged stuff without any additional pants code

    3. I'm very much opposed to the idea of forcing users to compose random other command line tools to get pants to behave in a way that is intuitively very natural for it to do: compile the changed targets. It's perfectly fine for us to have convenience functionality that covers the 99% use case, espcially since at this point it's existing behavior that just needs to be fixed up.

  1. Ship It!

Review request changed

Status: Closed (submitted)

Change Summary:


Landed this for now. I'm confident it shouldn't break anything and I'd like to use it in a test for something else.

There are a couple questions below that we can revisit, but the interface changes are minimal enough that I think it should easy enough to revise if/when we come up with something better.