Refactor and simplify go fetcher code.

Review Request #3902 — Created May 17, 2016 and submitted

benjyw
pants
2018, 3427, 3439
pants-reviews
jsirois

This is not a small change... It gets rid of a lot of complex logic
that, with hindsight, doesn't appear to be necessary. It replaces
that complexity with a more simple approach.

This mirrors the direction that Go itself is moving in: Go's heuristics
for fetching remote deps (including the meta tag protocol) are becoming
increasingly standard. In fact, there are comments in Go's codebase(1)
to the effect that they encourage code-hosting sites to support the
meta tag protocol so that Go can remove its own hard-coded special cases.

Fetcher Types

Under this change there are two Fetcher types: The existing ArchiveFetcher,
and a new CloningFetcher.

ArchiveFetcher handles special cases where we know how to map an import
path to a tarball path. This is useful for sites like github.com, which
do not currently support the meta tag protocol, and also for diverting
fetches to an internal artifactory, for repos that wish to do so.

CloningFetcher implements (a useful subset of) the standard Go heuristics:
It checks meta tags, and then clones the remote repo and sets its state
to the specified rev (currently this only works for git).

All remote fetches that don't map to an ArchiveFetcher will use the
CloningFetcher. This makes it trivial to use standard git-based remote
repos without any extra config. In particular, gopkg.in, golang.org/x
and google.golang.org now support the meta tag protocol, so there is
no need to special-case them.

The Fetcher class API has also changed a bit. Now a Fetcher encapsulates
the import_path its fetching, so a new Fetcher instance is created for
each fetch operation. The Fetcher classes use Subsystems, but are no longer
themselves subsystems.

Subsystems

For separation of concerns, this commit introduces two utility subsystems
that do what their names imply: GoImportMetaTagReader and ArchiveRetriever.
The fetchers use these. Note that they are not currently unit-tested,
because by the time you mock out the network stuff, there's not that
much left. However they are indirectly tested via several other unit and
integration tests.

FetcherFactory

This change gets rid of the fetcher advertisement+registration mechanism,
which with hindsight seems like overkill, given that we expect that the
current two Fetcher classes are all we're likely to need for the forseeable
future. The Fetchers class is gone. In its stead is much simpler
FetcherFactory subsystem. It now encapsulates the matcher logic, and uses
that to choose a fetcher.

This resolves https://github.com/pantsbuild/pants/issues/3439 and
https://github.com/pantsbuild/pants/issues/3427, and obsoletes
https://github.com/pantsbuild/pants/issues/2018, as we're now
going in a different, simpler, direction.

I can't add dbentley and yujie here because they aren't reviewers on RB, but
I will ask for their review feedback via email.

(1) https://github.com/golang/go/blob/7bc40ffb05d8813bf9b41a331b45d37216f9e747/src/cmd/go/vcs.go#L874

CI passes after all merges: http://jenkins.pantsbuild.org/job/pantsbuild/job/pants/branch/PR-3458/

Added various remote deps to one of the example targets and verified manually that it buildgens and compiles. Also added it to the integration test.

  • 0
  • 0
  • 4
  • 0
  • 4
Description From Last Updated
  1. 
      
  2. How about an option to turn this off? We'd like to never clone, because that won't work on a machine that has no WAN access.

    1. Ah yes, will do. Note that in a correctly-configured repo you'd never fall back to CloningFetcher because you'd presumably have all deps matching some matcher.

  3. 
      
  1. 
      
    1. Benjy,

      I don't know much about go so I cannot really give many useful comments, but I have read through your review and I think it makes sense to me.

      I wonder if you are going to export your rb description to some documents (pants website or README maybe?). The description is really clear and for a newbie in go plugin like me, this is exactly what I am expecting to read before I dive into the code.

      One more thing,
      I mentioned to John before that when using requests.session on Mac to connect to some go package website like (bazil.org/fuse), I got errors like "sslv3 alert handshake failure (_ssl.c:590)", which seems to be related to openssl shipped with Mac. I am not sure if it is reproducible on other people's Mac environment. If so maybe we want to mark this somewhere.

      Thank you.

    2. I will update the README, which also gets generated to the Pants docsite, in a separate change.

  2. why not use self._meta_tag_reader?

    1. Good catch. That was a bug.

  3. 
      
  1. 
      
  2. It would be nice if fetch failures were more comprehensible. Right now it will just say "failed fetching <URL>". It would be preferable to include enough context to say , e.g., what library it's fetching. WDYT?

    1. That information isn't available here, but there's a neat trick for catching an exception, updating its message, and reraising it with the original traceback, so I did that here. All its adds is the import_path, but I suppose that's something.

  3. 
      
  1. Ship It!
  2. 
      
  1. Catching up on reviews after some time away.  This is my last catch-up to do, but its meaty.  I'm going to push this to end of day, but a quick skim looks great to me!
  2. 
      
  1. LGTM - setting this up as an async shipit since all LGTM on the production path save for the `None` issue and stray file noted.
  2. Seems like sources=globs('*.py', exclude=['go_distribution.py']), is slightly more sane until these 2 can be collapsed into 1 target.

    1. Actually, I wanted to collapse these in this change, as I'd like to reduce the proliferation of microtargets in pantsbuild, but I didn't in case there was some special reason I wasn't aware of for them to be separate. Since it sounds like there isn't, I may as well just do this now.

  3. Its titchy, but my preference here would be to add a classmethod to act as a factory for a new FetchError given an existing FetchError and a prefix.

    1. I'll have to cheerfully disagree - FetchError.add_message_prefix(ex, prefix) seems strictly clunkier than ex.add_message_prefix(prefix), yet is ~identical to it under the hood (the latter is syntactic sugar for the former, in that if the former is a regular method than you can actually call it that way).

  4. Callers aren't expecting or handling None and the docs are silent. It seems like this should raise or else callers should be updated and either way docs should set the contract.

  5. This one is worth a linked issue going in I think.
    1. Done: https://github.com/pantsbuild/pants/issues/3502

  6. The https and then only fall back to http (possibly also only if flagged), as go does, would be good here.  I'd be happy with a TODO + linked issue.
    1. Done: https://github.com/pantsbuild/pants/issues/3503. IIRC this was how it was before, so I don't want to actually change it in this commit, which is hefty enough already.

  7. It would be great not to lose the meta-tag scraping tests even if it caused slightly more gymnastics in GoImportMetaTagReader. Just testing _find_meta_tag directly would be fine with me to limit gymnastics to ~0.

    1. Ooops! I was intending to copy those into a separate test for GoImportMetaTagReader exactly as you describe, but must have gotten confused. Done.

      Also added some test cases, because I don't think test_find_meta_tag_multiline was actually doing what it claimed.

  8. github.com/golang/protobuf/proto/BUILD (Diff revision 2)
     
     

    Revert - this looks like a buggy commit path... hopefully a test or buildgen is not creating this!

    1. I don't think I created it, at least not deliberately. I assumed it was buildgen. Or it might have happened while I was manually testing. Hmm... I'll delete and see if it pops up again.

  9. src/python/pants/scm/git.py (Diff revision 2)
     
     

    The binary arg would make sense if this method had a test that used it like detect_worktree above.

    1. I had it in there for uniformity, there's not really anything to test here... Think I should kill it? Or write a contrived test?

    2. Kill it.  Unused is always a burden.
    3. On further inspection, I'd prefer to leave it, as it's passed on as an arg to the class constructor, for reasons that are not (at least not ostensibly) to do with testing. So it makes sense that if the ctor takes that arg, this factory-ish method should too.

  10. 
      
  1. 
      
  2. dedent this block by 2
  3. 
      
Review request changed

Status: Closed (submitted)

Change Summary:

aa9b358e09f1f0157260777c29b3598e0cd9bb70

  1. Submitted! Thanks for the reviews.

  2. 
      
Loading...