Changes to zinc analysis split/merge test data generation:
Review Request #2095 - Created April 17, 2015 and submitted
|davidt, patricklaw, stuhood|
- Fix various issues with the anonymizer. - Add a mode to the test that causes it to generate canonical test data. - Add copious comments on how to use all of the above. - Replace the old test data tarball with loose test files that have non-ASCII content. Python tarfile doesn't play nicely with non-ASCII names (even with the pax format), so loose test files are easier. Git should compress them fine anyway. - Made the test a little more comprehensive - it now verifies the merged analysis against an expected file. - Made sure that split and merge don't leave dicts with empty values, as those create spurious diffs when comparing analyses that are semantically identical. This change establishes that our split/merge logic is sound in the presence of non-ascii file/class names in zinc analysis files. TODO: Test that rebasing is similarly robust to non-ascii content.
CI passes: https://travis-ci.org/pantsbuild/pants/builds/58981541
Is there a strong motivation to turn these into loose files? On the one hand, I dislike checking in binaries, but on the other hand this is generating a lot of random words that will pollute git grep results. Could this alternatively just be a binary that lives in bintray rather than checked in at all?
It's not clear to me that this is the case in all terminals, since some terminals use different encodings. Might be worth including a little python script to do the same thing very precisely.
This should use
u'...'is unnecessary because we import
__future__ ... unicode_literals.