[Linter Performance] Modify Thrift Linter to run on modified thrift files only by default

Review Request #2080 — Created April 15, 2015 and discarded

ity
pants
1427
pants-reviews
benjyw, jsirois, stuhood, zundel

Currently, thrift-linter runs on all thrift files within a specified target, every time its run. Modifying it to only run on thrift files that are modified for better performance.
Added a new option to be able to revert back to old behavior, such as to check the sanity of thrift files within a repo i.e. be able to run on all thrift files in the target specified.

For a single thrift file change locally (~50 thrift files in $target), this has brought down runtimes from an average of ~18s to ~9s:

$ time PANTS_DEV=1 ./pants thrift-linter --lint-all-thrift $target::
real 0m18.119s
user 0m7.267s
sys 0m1.616s

to:

$ time PANTS_DEV=1 ./pants thrift-linter $target::
real 0m9.028s
user 0m8.080s
sys 0m2.196s

TravisCI baking:
https://travis-ci.org/pantsbuild/pants/builds/58668622

  • 4
  • 0
  • 0
  • 0
  • 4
Description From Last Updated
So, to be honest, I think that changed-file detection might be overkill. Target invalidation should mean that things aren't being ... ST stuhood
Seems like an unnecessarily clunky option name? Since this is already in the 'thrift-linter' scope, just --all (or as stu ... BE benjyw
This causes the thrift in this directory to be double-owned. ST stuhood
Why this import? It doesn't appear to be used. BE benjyw
IT
ST
  1. 
      
  2. So, to be honest, I think that changed-file detection might be overkill.

    Target invalidation should mean that things aren't being linted repeatedly, so the only time this would kick in would be for cold builds in a branch. And if you've been developing in the branch, it's unlikely to be cold.

  3. If we leave this in, I'd invert it to: "--changed-only=True"... as is, it's not clear what the effect of turning off the flag would be.

  4. This causes the thrift in this directory to be double-owned.

  5. 
      
ZU
  1. 
      
  2. This is probably a question for Patrick.

    It seems like in this case, you've manually computed the changed target list. Since the logic is different, I'm wondering if when you run them through the invalidation check that some may not be considered invalid and you might inadvertently not build them?

    In this case, I think the only thing the validation check is doing for you is potentially filtering out targets (that you may have already computed to be invalid) and updating the cache and marking the targets as valid at the end of the context manager. Maybe you'd be better off doing all of it yourself since the logic here is essentially overriding the fingerprint strategy used for invalidation anyway?

  3. 
      
BE
  1. 
      
  2. Seems like an unnecessarily clunky option name? Since this is already in the 'thrift-linter' scope, just --all (or as stu suggests, --changed-only) would be fine.

    Also, do we want this to be an advanced option? Do you not intend people to set it via cmd-line flag?

  3. 
      
BE
  1. 
      
  2. Why this import? It doesn't appear to be used.

  3. 
      
JS
  1. I may be missing something, but based on the description:

    Currently, thrift-linter runs on all thrift files within a specified target, every time its run. Modifying it to only run on thrift files that are modified for better performance.

    This change offers no benefit. The linter still runs against all the files in a changed target (self._lint(vt.target)). The only benefit it offers fwict is that - for a pants workspace with no prior lint runs, (ie all thrift targets invalid), lints will still be skipped since all those invalid targets will be potentially skipped due to scm changed file detection. It seems to me what you want is to just use the cache instead (currently unused). This has the added benefit of of skipping the linting globally across users clones. Its true this would be an awkward cache result - basically just a bit to mark the target as valid - but that seems like a more beneficial hack than this local-to-the-clone optimization.

    1. We want a similar thing for the checkstyle/scalastyle tasks. Boolean result: "yep, looks good."

      Seems like a lot to 'shell-out' for though, which reminds me: it would be SUPER COOL to be able to speculate in the cache. Fire the cache request and start doing the work in parallel; then see which comes back first.

  2. 
      
ZU
  1. Can this review be discarded?

  2. 
      
IT
Review request changed

Status: Discarded

Loading...