A new implementation of SourceRoots.

Review Request #2970 — Created Oct. 14, 2015 and submitted

jsirois, nhoward_tw, stuhood, zundel

Takes source root information from options, instead of from
bootstrap BUILD files. Does pattern-matching, with sensible
defaults, so that in the typical case the right thing
"just happens" with no config necessary. For example, this
can support maven layouts out of the box.

Distinguishes between source and test roots, because that will
probably matter in the future, and we don't want to make people
change their source root decls when that happens.

Gets rid of validation that source roots contain only the "right"
types of targets. This was rather ad-hoc, and properly belongs
with any other per-repo validation rules.

Instead, introduces the concept of a 'language' (currently just
a string like 'java' or 'resources'), that source roots may be
associated with. What we do with that in the future is open.

Makes a single task (ListRoots) use the new mechanism, just to
make sure the plumbing works. This meant adding the SourceRoots
access to the Context, which required jiggering a couple of other
tests that created their context in non-standard ways.

A future change will switch all internal uses over to the new system,
and deprecate the old one.

CI passes: https://travis-ci.org/pantsbuild/pants/builds/85413405

Manually verified that ./pants roots returns the right thing in the pantsbuild repo.

CI still passes after addressing code review comments: https://travis-ci.org/pantsbuild/pants/builds/85625440

  • 0
  • 0
  • 4
  • 0
  • 4
Description From Last Updated
  1. Looks good! Thanks a lot for tackling this Benjy.

    Would you mind adding nhoward as well?

  2. src/python/pants/source/source_root.py (Diff revision 3)

    Would be great to have disabling this logic as an option at launch. Would be a killer feature, because we currently have no clue where we're violating the rule.

    1. I would rather deal with that in a followup though? This change is tricky enough already... I want to get it in with minimal perturbation, and then go to town when migrating the code to use this new mechanism.

  3. src/python/pants/source/source_root.py (Diff revision 3)

    isabs and relpath are fairly expensive. It's possible that skipping straight to the trie lookup will be faster, if it's safe.

    1. Yeah, I definitely want to get rid of this, but again - in a followup. The migration process will shake out what (if anything) is relying on this.

    2. Why is isabs() is slow?

  4. src/python/pants/source/source_root.py (Diff revision 3)

    Move to a property of the SCM impl?

    Alternatively, should this use the existing spec-exclude or exclude-target-regexp options?

    1. There's something similar in repro.py, so I should probably abstract away the .git part into the SCM impl, as you suggest. But spec-exclude/exclude-target-regexp are for a different purpose, so I'm leery of combining them. For example, repro.py excludes a smaller set of dirs than here (it needs distdir and workdir), so trying to abstract this out looked like more trouble than it was worth.

  5. src/python/pants/source/source_root.py (Diff revision 3)

    Configuring explicit languages here (rather than just matching anything with a source-root-parent followed by one ignored directory) adds a lot of complexity. That's not to say I don't think we should do it... but I think it's important to justify it.

    Are the advantages of an explicit language list sufficient to justify the complexity? Because without them you would go from 7 options to 2 (afaict).

    1. The unittest shows some real-world examples where the parent+lang isn't sufficient. E.g., src/jvm is where we at Foursquare keep both java and scala code in a single tree. And you can imagine src/py (instead of src/python). And apart from that, I think we do still need the option to define fixed, non-pattern roots as an escape hatch from unintended pattern-matching badness.

  6. src/python/pants/source/source_root.py (Diff revision 3)

    Perhaps a premature optimization (but perhaps not much more difficult?) would be to lazily iterate over slices here.

    Actually doing it via a generator would almost certainly lose the benefit, but:

    def keys(path):
      begin = 0
      end = path.find(os.path.sep)
      while begin is not -1:
        yield path[begin:end]
        begin = end
        end = path.find(os.path.sep, end)
    1. Maybe? I don't think this is a performance-critical section? And if so I'd like to see that in a profile before adding this somewhat less-readable optimization. I do plan to profile the heck out of this.

  1. lgtm though I'd like to hear what the solution for synthetic targets is.

  2. src/python/pants/source/source_root.py (Diff revision 3)

    Source roots exist in .pants.d for synthetic targets.

    1. Yes, but I don't think anyone expects all_roots() to return those. all_roots() is currently only used in the ListRoots task (which clearly shouldn't return synthetic target source roots, as those are an implementation detail). It's also used trivially in GoBuildgen, and I plan to remove that usage in my next change, which migrates us over to the new sourceroots.

      [Also, we won't have synthetic targets in the new engine. They are a gnarly implementation detail that I never liked and that we should eradicate...]

  3. src/python/pants/source/source_root.py (Diff revision 3)

    Interpolation rather than addition for string concatenation.

    1. Fixed here and below.

  4. src/python/pants/source/source_root.py (Diff revision 3)

    range(0, x) == range(x)

  1. 1) Is this ready to just remove the source_root() declarations sprinkled around the pants source tree

    2) I'm ok with moving stuff around to new directories as long as there is some kind of guideline for what is supposed to go in a new directory. could you add a src/python/pants/src/README.md to tell us what kinds of things you think belong there?

    1. 1) Yes, I have a followup change to do precisely that.

      2) Sure, but we should probably accept that as a principle, and do it more widely.

    2. Hmm, I guess we already do have README.md in more places than I thought. Great!

  2. src/python/pants/source/source_root.py (Diff revision 3)

    could you please add 'proto' for us?

    1. Done. In a future change I'll add something to normalize py->python and protobuf->proto on the read side, so that each language is represented by a canonical name.

  3. nit: it why not name the method option source_roots source_root-patterns and test_roots to test_root_patterns

    1. Ooops yes, that was cruft from a former version of the option names. Fixed.

  1. Awesome. Thanks Benjy!

  2. src/python/pants/source/README.md (Diff revision 5)


  3. src/python/pants/source/source_root.py (Diff revision 5)

    Short pydoc on the fact that langs is an out parameter?

  4. src/python/pants/source/source_root.py (Diff revision 5)

    It doesn't seem like the trie needs to know about canonicalizations. Would it be alright to do that within SourceRoots instead?

    1. That would require re-wrapping the result in a SourceRoot in two different places in SourceRoots. It would also makes the canonicalization functionality a little bit harder to test. On balance I'd prefer to keep it where it is?

  2. src/python/pants/source/source_root.py (Diff revision 4)

    What keeps this from having precedence over /src ?

    1. In the version you commented on, these are used to generate patterns by appending language names: src/java and src/main/java don't collide (neither is a prefix of the other).

      In the version post-Stu's comments, in which patterns use wildcards, the trie implementation always goes for the longest match.

  1. Thanks all! Submitted as c7cb8d547af707a6f35fc0c4cc6a5f70986f57ac.

Review request changed

Status: Closed (submitted)

Change Summary: