Perf improvement: rebase analyis file once instead of multiple times

Review Request #4352 — Created Nov. 3, 2016 and submitted

peiyu
pants
4026
pants-reviews
benjyw, ity, mateor, nhoward_tw, stuhood, wisechengyi, zundel

We have this outstanding issue https://github.com/pantsbuild/zincutils/issues/8 to scan analysis file once, currently it does multiple scans, one per replaced pairs.

In #3962 new zinc analysis files are bigger, almost doubled in time in local testing. Therefore optimization becomes necessary.

From profiling, each of the following accounts for about 1/3 of the rebase total time:

  • file write
  • read() to iterate through lines
  • everything else (string replacement)

This PR cuts the time from the first two items by a factor of two since there are two scans. That nets 1/3 saving

https://travis-ci.org/peiyuwang/pants/builds/173091425
https://travis-ci.org/peiyuwang/pants/builds/174378023

  • 0
  • 0
  • 1
  • 0
  • 1
Description From Last Updated
  1. I don't understand your sentence in the description about how this nets a 1/3 savings. What does "the former" refer to there?

    1. sorry.

      first two items are from file read/write, so proportional to the file size, which is 2/3 of the time, cut them in half results 1/3 saving.

    2. Cool, can you update the description to clarify? Thanks.

    3. yup, updated. Thanks!

  2. 
      
  1. 
      
  2. This will sort the "tuple"s/entries of the dict by len... I think you probably want to sort the keys of the dict by len.

    And if this matters, should probably cover it with a test.

    1. Great catch! fixed && added a test of two keys

  3. 
      
  1. Ship It!
  2. 
      
Review request changed

Status: Closed (submitted)

Change Summary:

Commited as 04afcc8476afd5bbff6326485bfb1f66b39b3d72

Loading...