Change ZincAnalysis{Parser} to be a thin wrapper around external zincutils.

Review Request #2125 — Created April 24, 2015 and submitted

davidt, patricklaw, stuhood, zundel

I split the zinc parse/split/merge logic out into pantsbuild/zincutils so we
can easily iterate on optimizing it. In particular, if we want to replace part/all
of it with native code, that will be harder to do in pantsbuild/pants, as pants
doesn't yet support building and packaging native code.

This commit turns ZincAnalysis and ZincAnalysisParser into adapters that wrap
that standalone code and adapt it to pants's Analysis/AnalysisParser class hierarchy.

In the future we can consider moving the zinc split/merge code back into the mainline
pantsbuild/pants repo, once we're happy with performance and have a decent way to
build/deploy it.

CI passes:

  • 0
  • 0
  • 1
  • 0
  • 1
Description From Last Updated
  2. Can we add a def sources that proxies to underlying, instead of exposing the underlying impl here?

  1. Please add docs on how to iterate on consuming the other package as source. I'm very worried that this is going to make it difficult to iterate on the changes we'd like to make to sbt.

    Re: the comment that it is easier to use native code in another repo: why is that? Having cmake/scons/whatever build files in the repo (even if you needed to use a hacky pants task to use them,) seems like it would definitely be preferable to a publishing step to iterate. And it sets us on a path to cleaning the hacky support up into a nice native integration between the c++ backend and the python backend.

    1. But actually, I think the changes I needed to make to analysis parsing were all within the content of the products and relations maps, so maybe that's parsing that could happen after the native code has returned merged contents? The gist of the change is that rather than pointing to files, the analysis entries for classfiles now all use URLs to account for the fact that they might live in jars:

    2. The goal of this change is that it would actually make it easier to support any future changes in zinc/sbt's output because almost any iteration on the specifics/nitty-gritty of reading and writing of the zinc analysis format can happen in isolation, against tests that ensure correctness, without core pants having to be concerned.
      Zincutils exposes a couple public methods to pants and they have test coverage, so as long as those tests pass, the interface contract with pants is upheld. It would be rare that you would need to publish zincutils artifacts and consume them in pants during iteration, and would likely indicate a failure of test coverage and/or the interfaces we'd picked.
      That said, I agree that we should doc how to run pants against a local, non-published version of an upstream dependency (eg using find-links or whatever) if we haven't already for those cases.
      Finally: We're still not sure what (if any) native code tools will be beneficial, but I'm wary of introducing whatever complexity or brittleness that might bring to core pants... particularly complicated bootstrapping between backends. Decoupling it, and hiding it in an upstream and opaque dependency -- provided this is possible without onerous publishing iteration -- seems like a win.
      Benjy is working on profiling and optimizing zincutils right now, so he'll be piloting how this separate impacts iteration and we'll find out very soon how works out.
    3. Re building native code in pantsbuild/pants: If it turns out that we do want to re-implement some of this in native code then we can figure out how to provide support for that in pants, and then move this code back into the main repo. But that's not a problem I want to chew on yet. The performance of split/merge is a real bottleneck in our builds now, and I want to fix that first and figure out the niceties later...

      That said, from my benchmarking and perf improvements so far, I'm less sure we'll even need native code. It may simply be a matter of optimizing the python code, in which case moving it back into the main repo will be easy.

      Re iterating: That's actually why we did this. We don't need pants in order to iterate on performance (or even correctness): what this code does is entirely defined by the zinc analysis format, no pants required. The unit tests and benchmarks are all we need. So it's really quick and easy to benchmark, profile and rewrite in native code, using standard python tools that aren't integrated into Pants (yet).

      I think the changes you're talking about are more-or-less orthogonal to this code? Or do you need to add methods on the underlying analysis object?

    4. I think the changes you're talking about are more-or-less orthogonal to this code? Or do you need to add methods on the underlying analysis object?

      Yea, it looks like they might be, depending on where exactly you're optimizing. Wanting to push the url parsing down into the native code is what might make this slightly more difficult.

  1. I've got no reason to use zinc so I don't feel I'm in a position to comment on this change.

    By native code, do you mean code compiled in 'C'?

    1. Yeah, we've been evaluating moving some of the hottest parts of split-merge code into a compiled c extension or cython (with fallback to current pure-python if speedup module can't be imported or something), as it can actually now be one of the most time consuming elements of a build when everything else gets remote cache hits.

  1. Please add support to the pants build for using zincutils from source (and a few lines on how to use it), and then I'll shipit.

    1. Oh, also: this might be a good opportunity to rename the module to sbtutils... it's actually sbt that writes analysis, rather than zinc.

    2. True, but it's probably better to change the terminology in pants first, as it uses 'zinc' everywhere.

  1. Ship It!
Review request changed

Status: Closed (submitted)

Change Summary:

Submitted as 8c4f6bc48da780fe9be5748cccc4c10e39562dce.