What is the difference between dump and dumps from the Pickle module in Python?

Asked

Viewed 1,443 times

4

I have read the documentation of Python and also of Pickle himself, but I could not assimilate the content (lack of examples). On the Web I only found information about using "dump + load" and nothing about "dumps + loads"

1 answer

4


dump, and load - each take as parameter an open file (or other object with the file interface) - and save the serialized content of the object in that file; (or load it, in the case of load).

The "dumps" does not have the parameter equivalent to the file and returns the serialized object as a byte string. Loads takes a byte string as parameter and returns the rebuilt object. They are used for when you will not write the result of the pickle to a file immediately, but for example you are Serializing objects to transmit them over the network, or to another process.

The standard Python library uses dumps and loads a lot internally, just in modules like multiprocessing, to pass objects transparently to other processes.

>>> a = {"b": ["c", "d", {1,2,3}, ({"e": "f"})]}
>>> print a
{'b': ['c', 'd', set([1, 2, 3]), {'e': 'f'}]}
>>> import pickle
>>> b = pickle.dumps(a)
>>> repr(b)
'"    (dp0\\nS\'b\'\\np1\\n(lp2\\nS\'c\'\\np3\\naS\'d\'\\np4\\nac__builtin__\\nset\\np5\\n((lp6\\nI1\\naI2\\naI3\\natp7\\nRp8\\na(dp9\\nS\'e\'\\np10\\nS\'f\'\\np11\\nsas."'
>>> c = pickle.loads(b)
>>> c == a
True
>>> c is a
False

Other serialization modules mimic the pickle interface, and have the 4 methods: dump, dumps, load and loads - this is the case of the "json" and "Marshall" module. json creates a serialized object as specified ECMA-404, which is syntactically valid Javascript, almost syntactically valid Python and interchangeable with several languages - however it can only serialize a subset of native Python data types (Unicode strings, integers and floats, booleans, None, lists and dictionaries - other strings are converted to lists) - Already Marshall can serialize all Python native data types: lists, dictionaries, sets (sets), complex numbers, etc... but will falahr with objects defined in pure Python classes - even if they are in the standard library, such as Ordereddict, namedtuple, and several others.

Pickle in turn will serialize almost everything in front of it - including class objects defined in its own code, and functions - (with a few remarks: whoever deserializes has to "know" the names of functions and classes of serialized objects)

Finally, if you really want to serialize crazy things - including functions with your content ("code" object), there’s a module in Pypi that works on top of Pickle and supports this: the Dill (and guess what: it has load, Lods, dump and dumps)

(Beware of "load" and "dump": they must use open files in binary mode, never in text mode - especially in Python 3.x)

  • When few data remains as a phrase for example, which is faster, use pickle or save as string in a text file? I’m talking about processing performance.

  • If it is only text, it is best to code in utf-8 and write directly to a text file. Or use the standard library’s "shelve" module: it gives you a Dictionary-like object, but which is persistently transparent on disk (it uses pickle internally, but you don’t even need to know it)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.