Allow looking up buildcache hosts in pants

Review Request #2815 — Created Sept. 14, 2015 and submitted

2171, 2219
areitz, ity, kwlzn, nhoward_tw, stuhood, zundel

Today buildcache urls are statically configured in pants. In Twitter case we
use a centralized load balancer, that becomes a bottleneck with the recent
monobuild launch. Deploying a local proxy is a possibility but not quite ready
yet because we have heterogeneous deployment environments besides aurora,
jenkins for example.

This CL adds an option --resolver to CacheSetup, which point to an endpoint
that supplies buildcache urls. The CL provides a RESTful implementation of the
resolver. --resolver is optional when it is not supplied, today's behavior
remains unchanged.

  • 0
  • 0
  • 5
  • 0
  • 5
Description From Last Updated
  1. One thing I like to add is to be smart with what to send to pinger, instead of pinging 100 we could just do 2. See P2C in

    1. the alternative is to have the internal endpoint returns less urls, that seems ok too

  2. src/python/pants/cache/ (Diff revision 1)



    to make it slightly more intuitive?

  3. src/python/pants/cache/ (Diff revision 1)

    afaict, you shouldn't actually need the bool()s here (which make this read a little funny).

    these options will either be an empty/non-empty list or a string/None - both of which provide implicit bools when examined - and the boolean expression they're examined in will return a proper bool type.

    return and (self._options.read_from or self._options.resolve_read_from)
    1. looks i can simplify but bool is still needed (write = True, write_to = None, resolve_write_to_url = 'some string') will return a string

      >>> True and (None or 'some string') 
      'some string'
      >>> True and bool(None or 'some string') 
  4. src/python/pants/cache/ (Diff revision 1)

    is there any benefit to merging these two things vs making them mutually exclusive?

    I can't see a good reason to have both a static cache URI + resolved URIs - seems like you'd either want one or the other.

    1. We need both, otherwise you can't have a local cache and a remote cache, unless you want the remote cache to specify where the local cache is.

      The other issue I see with this is that when a list or tuple is passed to _do_create_artifact_cache, it has to have either 1 or 2 elements or its behavior is undefined. Undefined here means that it doesn't create a cache and returns None.

    2. good call! current multiple remote addresses use '|' as separator

      Changed to:
      - join resolved addresses also with '|' so it's compatible with addresses loaded from config
      - fail if len(spec) > 2 with TooManyCacheSpecError, so if we set remote address in config as well providing the resolve url will fail

      (this will change if to address stu's comment - to reuse read-from)

  5. src/python/pants/cache/ (Diff revision 1)

    maybe_list() will always return a list. also, since you only ever pass self._options.read_from here that will always be a list (even if empty) and not None (so the 'or []' doesnt seem necessary).

    merged_uris = maybe_list(uris)
    1. i see, make sense, read_from is list_option, simplified. thanks

  6. src/python/pants/cache/ (Diff revision 1)

    this will silently swallow non-2xx codes - is that intentional?

    additionally, the except here is too broad - other unrelated errors in this block could get masked.

    consider resp.raise_for_status() + excepting on requests.exceptions.HTTPError, then wrapping that in an easily catchable Resolver.ResolutionError.

    you'd also want to catch ValueError around the resp.json() to handle invalid JSON.

    1. It might also make sense to ensure that the json has the right format here. You could use assert_list. Otherwise, it may blow up at some later point when the json's contents are used. It could also be coerced into list from an iterable type resulting in unexpected behavior.

    2. correct me if i am wrong, my understanding in case of buildcache failures (transient or not) pants shouldn't fail, its performance may degrade, not desirable but better than fail pants entirely, and this is the behavior today.

      I added the assert_list check and some comments to clarify

  7. tests/python/pants_test/cache/ (Diff revision 1)


    you probably also want spec_set=True.

    1. typo fixed, added spec_set=True, seems like good to have "This is useful if you want to ensure your code only sets valid attributes too"

  2. src/python/pants/cache/ (Diff revision 1)

    IMO, this is too tightly coupled to a particular service discovery strategy. As sirois said, an mdns resolver would a reasonable choice as well, and a second resolver implementation should require new flags to use.

    Instead, I'd recommend something like --resolver with choices ['none', 'pinger', 'rest-hostlist'], where 'none' disables pinging, and 'rest-hostlist' uses the hostnames from the the --read-to and --write-from as its inputs.

    1. should not* require

    2. this makes a lot of sense, will do.

  1. I'll wait for the next diff before doing another pass.

  2. src/python/pants/cache/ (Diff revision 2)

    For us it might be a better integration to have some kind of script you could invoke that returned a json buffer instead of calling REST directly (our service discovery has a REST API, but probabaly won't return the same format as this one expects)

    1. (oops didn't mean to publish to pants-reviewers yet, was going to continue iterate within twitter first)

      Anyway, Eric, what's your rest interface like? since ours is still WIP, maybe we can use one format

    2. So, we have a REST interface for some features and we have another interface that uses a .proto over RPC. And all of that is bundled up with special sauce inside of a Java library. I'm not sure which one we would prefer to use which is why I suggested a script.

    3. Got some feedback from the rest of our team. They supposed that we would probably prefer some kind of proxy arrangement over coupling with service discovery mechanism for this application. We have some proxies that work like services, and in other cases we deploy local proxies on every machine.

  1. Overall this looks very flexible now!

  2. src/python/pants/cache/ (Diff revision 3)

    Usually we add pydoc here instead of just pass

  3. src/python/pants/cache/ (Diff revision 3)

    also add that this key is expected to contain a list of... (hostnames? urls?)

    1. set of urls (URL is the format used by remote artifact cache)

      "- a URL of a RESTful cache root."

      improved comments

  4. src/python/pants/cache/ (Diff revision 3)

    use super() here instead of this style of invoking super methods.

  5. src/python/pants/cache/ (Diff revision 3)

    Thinking ahead to when you are adding tests for these two raise cases, you might want to consider declaring different exception types. Usually I make each point in the code a different exception type to make testing simpler. e.g.

    with assertRaises(ResponseParserParseError):
    with assertRaises(ReponseParserUnsupportedFormat):
    1. yea, except i realize ReponseParserUnsupportedFormat is probably not needed since it's guarded by option parser.

      added a general ValueError more like an assert

  6. src/python/pants/cache/ (Diff revision 3)

    add pydoc

  7. src/python/pants/cache/ (Diff revision 3)

    add pydoc

    1. will docstring inherit automatically happen?

  1. My only blocking issue is adding something like a "NoopResolver" class.

  2. src/python/pants/cache/ (Diff revision 4)

    pluralize to TooManyCacheSpecsError

  3. src/python/pants/cache/ (Diff revision 4)

    "can also contain a URL"

    "the artifact cache URLs"

    Would it be clearer to just say that this list is the input to the resolver, and that the "none" resolver uses the list as is?

    1. done, yea, they are the implementation details

  4. src/python/pants/cache/ (Diff revision 4)

    It's unclear what is happening here. Ideally it should be illegal to construct a CacheFactory for which the resolver parameter will be ignored. Or alternatively, the resolver setting should be passed in rather than the resolver itself. Ditto pinger?

    1. Er, ignore this comment and see the next one.

  5. src/python/pants/cache/ (Diff revision 4)

    Please use the Null object pattern instead, by adding something like a "NoopResolver" that implements resolution as a noop.

    Basically, the self._resolver field should never be None.

    1. to do this

      • maybe_resolve is no longer maybe, will always run resolver
      • since it's possible for REST resolver to return empty, to distinguish we can validate it's always returns non-empty and raise exceptions otherwise.


  6. src/python/pants/cache/ (Diff revision 4)

    IMO, making response-parsing pluggable is pre-mature.

    It's easy enough to add/use a different resolver instead.

    1. Sure, none of these options are really needed.

      Will just use it as a utility class

  1. Thanks for all the great tests.

  2. src/python/pants/cache/ (Diff revision 5)


    1. this is to match the option 'none' vs 'rest', implement side, when resolver is 'none' it is None, now we switched to noop resolver.

      enclosed with quotes so it's clear

  3. src/python/pants/cache/ (Diff revision 5)


  4. src/python/pants/cache/ (Diff revision 5)

    I know it wasn't here before, but you added a new parameter and it would be nice to have pydoc to show the types. It isn't just for documentation, IntelliJ and other IDEs can pick up on it and give you nice features like completion. Here's a good example:

  5. src/python/pants/cache/ (Diff revision 5)

    Even in this case I would prefer a specific Exception subclass over a general ValueError.

    Somewhat related, one of my favorite war stories is from an manager of mine who was programming a PDP11 back in the good old days of computing. He found such an error case and put in a log message, "This can never happen." A year later got a support call from the customer. The disk was full! The reason was there was a log file filled with:

    This can never happen
    This can never happen
    This can never happen
    1. This error is not end user should care about, if happens is a pants internal bug.
      because the earlier code should have done validations, would assert work?

      assert False, 'Unknown content format: {0}'.format(self.format)
    1. One more thing: please expand the review description and summary to completely describe the feature and how to use it.

  2. src/python/pants/cache/ (Diff revision 6)

    I don't uunderstand the need for this requirement. If we're able to differentiate between local and remote specs, why restrict what order they arrive in?

    1. hmm, i can't think of a reason to have remote first and local second.

    2. I see what you mean, you mean just for the pants.ini, order doesn't matter since we always do local first then remote anyway. A better representation than list maybe to have 'local' and 'remote' as separate fields, but that's an unnecessary breaking change.

      Removing this constraint

  3. src/python/pants/cache/ (Diff revision 6)

    IMO, just make these inverses of one another.

    def is_local(string_spec):
      return not is_remote(string_spec)
    1. is_local != !remote

      ftp:// for example will fail current check

Review request changed

Status: Closed (submitted)

Change Summary:

Merged as d78d1e74f63c1b0d6ba77de4e31972213aa1008a