[engine] fix unpickling error in multiprocessing engine tests

Review Request #3751 — Created April 24, 2016 and discarded

3149, 3261
jsirois, kwlzn, patricklaw, stuhood

Not clear what the root cause of the UnpicklingError is, but switching
from cStringIO to StringIO seems to fix the flakiness, have run many tests
to confirm, at the cost of significant performance impact.

A few things we know about the failure:

  • Those unpickable data appear to be old space has been GC-ed [1] for sth else.
  • Failures only were seen in travis, not on personal mac.
  • Switching to use in-memory db instead of lmdb makes no difference.
  • Switching cPickle to pickle also makes no difference.
  • Switching cStringIO to StringIO haven't seen any failures.

Did some search in python bugs with no avail. Our cStringIO usage looks alright. [2]
Tried to read the cStringIO source [3], but didn't spend too much time.

The reason it hits travis my guess could be due to 1) multiprocessing has
higher memory overhead, 2) we use #cpu to allocate multiprocessing pool size,
3) travis has more cpus compared to local mac.

This does impact performance significantly, about +50% for the cold cache,
and +100% for the warmed up cache in the list test I did.

[1] https://github.com/pantsbuild/pants/issues/3149 with examples of the
hexlified and unhexlified unpickable data.
[2] https://github.com/pantsbuild/pants/blob/master/src/python/pants/engine/exp/storage.py#L169-L173
blob is what's reallocated.
[3] cStringIO https://github.com/python/cpython/blob/2.7/Modules/cStringIO.c


  • Test pickle/stringIO combinations:
    (with test has a loop for running test 5 times)

cStringIO, cPickle: job 457 to 467 all failed except for 1
cStringIO, pickle: job 469 to 479 all failed
StringIO, cPickle: job 480 to 490 all passed
StringIO, pickle: job 491 to 501 all passed

  • This patch vs master:

with this patch: job 502 to 512, 22/22 (restarted once)
without this patch: job 513 to 523, 3/22 failed

  2. src/python/pants/engine/exp/engine.py (Diff revision 1)

    removing try/except just to be always consistent using the same library

  1. lgtm for the moment. would take a look at io.BytesIO as a possible StringIO replacement.

  2. src/python/pants/engine/exp/storage.py (Diff revision 1)

    this should probably get a TODO?

    1. +1 - this should link to https://github.com/pantsbuild/pants/issues/3149 so your hard-won knowledge is not lost.
  1. Ship It!
Review request changed

Status: Discarded

Change Summary:

https://rbcommons.com/s/twitter/r/3761/ is the right fix