Optimize goal changed

Review Request #1470 — Created Dec. 11, 2014 and submitted

dturner-tw
pants
036db86...
pants-reviews
ity, jinfeng, jsirois, peiyu, zundel

Make goal changed approximately O(paths + targets), instead of something much worse.

Correctness: https://travis-ci.org/pantsbuild/pants/builds/43764453

Performance: tested on a megachange branch on Twitter's monorepo; previous time was on the order of hours; time with patch is 3 minutes.

  • 0
  • 0
  • 1
  • 0
  • 1
Description From Last Updated
DT
DT
JS
  1. 
      
  2. kill this redundant line

  3. 
      
DT
JI
  1. no test changes/addition in test_what_changed.py?

    1. Nope, this is a pure speed optimization, so no need to change the tests. I guess we could add a benchmark, but I think that would be bad (to get enough changes for this to matter would require a fairly large benchmark).

  2. 
      
ST
  1. Ship It!

  2. 
      
IT
  1. Ship It!

  2. 
      
DT
Review request changed

Status: Closed (submitted)

PA
  1. 
      
  2. The AddressMapper has all of this caching built in. You should probably discard this local optimization and just use the BuildFileAddressMapper (which is also on context and already bound to the BuildGraph on context) to do this candidate mapping.

    1. I'll submit another RB for that, since I just merged this one.

    2. Thanks!

    3. I'm a bit confused, actually. I'm trying to find (and inject) the set of addresses for each build file, parsing each build file at most once. How does anything in BuildFileAddressMapper help here?

    4. _addresses_in_spec_path is precisely what you want, with caching baked in. The only difference is that it works on logical BUILD files (one per directory) rather than individual sibling BUILD files in a directory. But that is semantically exactly what you want here, since searching for candidate targets is an operation on logical BUILD files and not individual ones.

      For reference, here is our internal implementation of this. We're in the process of open sourcing a lot of these internal utilities we've written, but hadn't gotten to this one yet: https://gist.github.com/anonymous/2d3fb59c1d548ce7e7c5. Feel free to use this, particularly _compute_source_addresses.

    5. https://rbcommons.com/s/twitter/r/1488/

  3. This is a massive optimization, I'm surprised this wasn't caught when I added from_cache.

  4. 
      
TE
  1. Ship It!

  2. 
      
Loading...