Customize tarfile module next() method

Review Request #4123 - Created July 25, 2016 and submitted

Information
Yujie Chen
pants
3730
Reviewers
pants-reviews
benjyw, jsirois, kwlzn, mateor, nhoward_tw, patricklaw, peiyu, stuhood

tarfile.TarFile object is iterable and has a next() method. next() will parse the header and save parsed info. During parsing, a lot of checks are done, to make sure the header is valid. And if there is something wrong with the header, exceptions will be thrown. next() catches a lot of them but not reraise what it catches in all cases.

We have a corrupted tgz file inside Twitter. During parsing, an error in one of the headers is caught, but next() hide it silently. Thus cache read succeeds with no error. When pants uses this corrupted tgz for analysis, it throws exception because it couldn't find a file which is supposed to be in the tarball.

From source code (https://hg.python.org/cpython/file/2.7/Lib/tarfile.py#l2335), we can see that InvalidHeaderError will ONLY be raised if it happens in the beginning of the tar file. Actually, a lot of exceptions are hidden by tarfile module. tarfile module simply thinks these exceptions mark the end of tarball. It probably is not a violation of tar standard, but it is not acceptable for pants. Pants relies on the integrity of tarball. Any defect in tarball should cause cache read to fail. A partially extracted tarball serves no use to Pants.

This changeset does the following:
1. Customize next() method in tarfile.TarFile for Python2.x. Now it will throws exception loudly whenever an invalidheader is seen if errorlevel is bigger than 0. If on Python3.x interpreter, original tarfile.Tarfile will be used.
2. Add test cases.

ci green:
https://travis-ci.org/pantsbuild/pants/builds/155570257

Issues

  • 0
  • 3
  • 1
  • 4
Description From Last Updated
Yujie Chen
Kris Wilson
Nick Howard (Twitter)
Yujie Chen
Nick Howard (Twitter)
Stu Hood
Yujie Chen
Benjy Weinberger
Mateo Rodriguez
Yujie Chen
Yujie Chen
Kris Wilson
Yujie Chen
Mateo Rodriguez
Yujie Chen
Stu Hood
Yujie Chen
Yujie Chen
Yujie Chen
Yujie Chen
Review request changed

Status: Closed (submitted)

Change Summary:

a9957e91d8f262c8d1384c380cf8fb77ec075eb1. Thanks, Kris, Nick, Mateo, Stu and Benjy!

Loading...