Self-compressing pickles

I’ve been working on some pickle security stuff. This is a teaser.

Python pickles are extremely flexible: they can run essentially whatever code they want. That means you can create a pickle that contains a compressed pickle. The consumer doesn’t know if an incoming pickle will be compressed or not: the Pickle VM takes care of the details.

To do this, we define a useful little helper class:

class PickleCall(object):
     def __init__(self, f, *args):
        self.f, self.args = f, args
     def __reduce__(self):
         return self.f, self.args

PickleCall is nothing but a convenience function for us to encode the function f being called with some args into a pickle. If you’ve used pickle before and you know that it normally encodes classes by name, you might expect that the victiconsumer of the pickle also needs to define PickleCall, but that’s not the case. This class accomplishes that by explicitly implementing part of the pickle protocol with the __reduce__ method: it tells pickle how to encode it, and PickleCall isn't involved anymore. Of course, the “obvious” thing to use PickleCall with is “os.system” and something involving /dev/tcp.

Once you have PickleCall, writing the function that dumps an object to a string but with embedded zlib compression is straightforward:

from pickle import loads, dumps
from zlib import compress, decompress

def zdumps(obj):
    zpickle = compress(dumps(obj), level=9)
    unz_call = PickleCall(decompress, zpickle)
    loads_call = PickleCall(loads, unz_call)
    return dumps(loads_call)

We can check that it works:

zpickle = zdumps(["a"] * 100)
pickle = dumps(["a"] * 100)

loads(pickle) == loads(zpickle)
# => True
len(zpickle), len(pickle)
# => (82, 214)

Internally, the structure for this looks as follows:

   0: \x80 PROTO      3
   2: c    GLOBAL     '_pickle loads'
   17: c    GLOBAL     'zlib decompress'
   34: C    SHORT_BINBYTES b'x\xdak`\x8e-d\xd0\x88`d``H,d\xcc\x18\x160U\x0f\x00W\xb6+\xd4'     <= this is zlib-compressed pickle
   63: \x85 TUPLE1   <= set up arguments for zlib decompress
   64: R    REDUCE   <= call zlib decompress
   65: \x85 TUPLE1   <= set up arguments for pickle load
   66: R    REDUCE    <=  call pickle load
   67: .    STOP

If you're trying to protect pickles the punchline here is that you probably need to whitelist because there are too many ways to hide things inside a pickle. (We're working on it.)

(This post was syndicated on the Latacora blog.)