Add new function open_zip64 which defaults allowZip64=True for Zip files.

Review Request #1708 — Created Feb. 4, 2015 and submitted

areitz, dturner-tw, jsirois, patricklaw, zundel

Twitter Production machines python was upgraded to 2.7.9.
After that we started seeing LargeZip Files exceptions.

After debugging, we saw it was due to this bug fixed in python.
In python 2.7.9 following bug was fixed
ZipFile correctly throws an exception if allowZip64 is False and the file count exceeds 65535.

Ran bundle goal locally.

  • 0
  • 0
  • 2
  • 1
  • 3
Description From Last Updated
  1. Ship It!
    1. discard this shipit. :) Pressed the wrong tab.

  2. I'd be happy to see the new open_zip64 API used here too.

  3. src/python/pants/util/ (Diff revision 1)

    2 blank lines between top-level functions

  1. The import changes probably mean there are corresponding BUILD dep changes needed.

  2. Same comment here as above, does it make sense to get rid of 'open_jar'?

  3. Do we really want to change the name of the symbol to open_jar still, or leave it as open_zip64? Personally, I find this to be a bit confusing.

    1. Will get rid of open_jar

  2. src/python/pants/util/ (Diff revision 2)

    why not just add a new keyword arg, allowZip64=True? Then you could eliminate these three lines.

    1. Did run a lot of experiments to add the keyword arg with default.
      However, The following is an syntax error.

      def open_zip64(path_or_file, args, allowZip64=True, *kwargs)

    2. def open_zip64(path_or_file, allowZip64=True, *args, **kwargs)
    3. I also belived the above should work smartly and pass all the positional args to "*args" and remaining keyword args should be in "**kwargs"
      However, This does not work the way i thought it would. I had tried it much before. 
      The travis CI failed
                           E   	  File "/home/travis/build/pantsbuild/pants/pants.pex/pants/backend/jvm/tasks/", line 63, in execute
                           E   	  File "/home/travis/build/pantsbuild/pants/pants.pex/pants/fs/", line 89, in create
                           E   	  File "/opt/python/2.7.9/lib/python2.7/", line 17, in __enter__
                           E   	    return
                           E   	  File "/home/travis/build/pantsbuild/pants/pants.pex/pants/util/", line 160, in open_zip64
                           E   	  File "/opt/python/2.7.9/lib/python2.7/", line 17, in __enter__
                           E   	    return
                           E   	  File "/home/travis/build/pantsbuild/pants/pants.pex/pants/util/", line 145, in open_zip
                           E   	  File "/opt/python/2.7.9/lib/python2.7/", line 756, in __init__
                           E   	    self.fp = open(file, modeDict[mode])
                           E   	IOError: [Errno 2] No such file or directory: u'/home/travis/build/pantsbuild/pants/dist/'
      If you dig more, should write to dist/ However it throws "No such file or directory" error.
      If you see the defaults for ZipFile is read mode.
      After printing a bunch of debug statments, the "w" value is assigned to allowZip64 and hence mode takes the default. 
      I could either change all the calls to open_zip64 to specify mode="w" or do what i have done. 
      I chose the later.
      My argument to choose the later was 
      1. Minimal changes.
      2. Same signature for open_zip and open_zip64.
    4. Ah right, I forgot about the positional args. In that case (if you don't just pass True, which I think is better, but an orthogonal discussion), you can simplify this to:

      allowZip64 = kwargs.pop('allowZip64', True)

      since dict.pop() takes a default as a second arg. You can even inline the above instead of assigning it to a variable, though that's probably too clever to be readable.

    5. kwargs.pop with default sounds good to me.

  2. src/python/pants/util/ (Diff revision 2)

    Just take this as a kwarg and default it to True to simplify this code.

    But I'm actually kind of confused here: shouldn't this be True no matter what? It seems nonsensical to call open_zip64 with the option of not allowing zip64.

    Finally, since all this does is wrap another context manager, it doesn't need to be a context manager itself. It can just return the result of calling the context manager it wraps.

    1. The reason why did not want to change the api for open_zip with default allowZip^4=True is beacuase 64-bit ZIPs are known not to work with the JAR files when run on Java earlier than 1.7. We are on Java 7, however wanted your comments on if SQ or FS is still on java6

    2. But there aren't any users of open_zip64 who are actually passing in False, are there? If someone in particular wants a non-zip64 opener, they can be explicit and use open_zip with allowZip64=False.

      To answer your question: we are on JDK7, nearly 8 at this point. So we have no objections to always using zip64 jars.

    3. Ignore that, I thought you meant change open_zip API to set allowZip64=True.
      Wanted to default the argument. 
      The below is a syntax error.
      def open_zip64(path_or_file, *args, allowZip64=True, **kwargs)
      I am not sure, but i think in order to add allowZip=True, i
      I will specify  all the positional args needed by ZipFile.
    4. Please see my comment on David Turner's thread re: the correct syntax. I still think this shouldn't even be a parameter to open_zip64 though.

  1. Ship It!
Review request changed

Status: Closed (submitted)

Change Summary: