[engine] bug fix: to pickle/unpickle within the proper context

Review Request #3761 — Created April 26, 2016 and submitted

peiyu
pants
3149, 3274
pants-reviews
jsirois, kwlzn, patricklaw, stuhood

We have two places that pickle and unpickle functions are not used within
the proper context.

  • StringIO buffer for write, saving buffer to storage happens outside its context.
  • lmdb buffer for read, the invalid buffer was used outside transction context.

The earlier attemp is probably red herring: https://rbcommons.com/s/twitter/r/3751/.
Thanks Stu for catching!

These two behaviors are clearly documented, see [1] and [2].
[1] https://docs.python.org/2/library/stringio.html
[2] https://lmdb.readthedocs.org/en/release/

https://travis-ci.org/peiyuwang/pants/builds/125890340 passed.
https://travis-ci.org/peiyuwang/pants/builds/125941151 passed.

  • 0
  • 0
  • 2
  • 0
  • 2
Description From Last Updated
ST
  1. 
      
  2. src/python/pants/engine/exp/storage.py (Diff revision 1)
     
     

    I think there is one more piece here. For in-memory storage, we just store the value directly in a dict. In that case, you'd need to clone it first. In the lmdb case, it copies the value into private buffers, so it should be fine.

    1. Good point. Made a similar transform method for put.

      If the input is str, it makes no difference since strings are interned.
      If the input is buffer, not sure whether the buffer will be copied or reused. https://docs.python.org/3.1/library/functions.html#bytes only says it's a new bytes object.

      >>> x = bytearray()
      >>> x.extend('hello')
      >>> y = bytes(x)
      >>> id(x)
      4300926288
      >>> id(y)
      4300926336
      

      The ids are different, but still not sure about the underlying buffer.

    2. i could allocate a bytearray of the same size, and copy things over, that would be bullet proof.

    3. If the input is str, it makes no difference since strings are interned.

      They're interned, and immutable... which means they're guaranteed safe.

      If the input is buffer, not sure whether the buffer will be copied or reused.

      >>> x = bytearray()
      >>> x.extend('hello')
      >>> y = bytes(x)
      >>> y
      'hello'
      >>> x[0] = 'y'
      >>> x
      bytearray(b'yello')
      >>> y
      'hello'
      >>>
      
    4. ah, nice!

    5. A correction: strings are immutable, but they are not interned by default, afaik. They are only interned if they are literals in a python script. ie, running the following as a script would succeed:

      1
      2
      3
      4
      #!/usr/bin/env python
      x = 'hello'
      y = 'hello'
      assert id(x) == id(y)
      
  3. src/python/pants/engine/exp/storage.py (Diff revision 1)
     
     

    I think the recommended formatting for these is lambda x: x.

  4. 
      
PE
ST
  1. 
      
  2. src/python/pants/engine/exp/storage.py (Diff revision 2)
     
     

    Can you define this function once in the module as _identity?

  3. src/python/pants/engine/exp/storage.py (Diff revision 2)
     
     

    Can you define this function once as _copy_bytes or something?

    (inline lambdas are distracting.)

  4. 
      
ST
  1. (after moving inline lambdas out)

  2. 
      
PE
ST
  1. More readable, thanks.

    It seems odd to override default parameters in subclasses like this, but it seems like an "ok" way to solve this. I feel like we might eventually separate get/put into copying/non-copying versions, but let's land this as a fix.

  2. 
      
PE
Review request changed

Status: Closed (submitted)

Change Summary:

Merged as f22f8ccc282bf497dc49dbf3c6b4fe487cf1466d

Loading...