What are the differences between ways of manipulating huge strings with Python?

Asked

Viewed 94 times

7

When it is necessary to work with strings very large it is normal to see approaches that seek to optimize the process in some way. Of the approaches I have seen are using:

  1. io.StringIO or io.BytesIO
  2. tempfile.TemporaryFile
  3. tempfile.NamedTemporaryFile
  4. tempfile.SpooledTemporaryFile

The four forms are objects file-like and can be used with a context manager.

with io.StringIO() as stream:
  ...

with tempfile.TemporaryFile() as stream:
  ...

with tempfile.NamedTemporaryFile as stream:
  ...

with tempfile.SpooledTemporaryFile as stream:
  ...

What is the difference between using each of these solutions? There are other ways?

1 answer

3


Observing the documentation Regarding the tempfile module it is possible to resolve some of your doubts. This library is used for the creation of temporary files and directories, and for simple uses the differentiation between their methods at a detailed level is not so necessary. However they exist at the operating system level, as I explain below:

  • tempfile.TemporaryFile(mode, buffering, encoding, newline, suffix, prefixe, dir):
    • This method returns an object file-like which can be used as a storage area. Now looking at the Unix-based OS side, the directory entry for the file is not created or if it is, is removed immediately after creating the file.
  • tempfile.NamedTemporaryFile(mode, buffering, encoding, newline, suffix, prefix, dir, delete):
    • This one, it works the same way TemporaryFile(), with one exception, the file will have a name visible in the filesystem (in Unix the directory entry is not unlinked). And that name can be discovered with the attribute name() of the object.
  • tempfile.SpooledTemporaryFile(max_size, mode, buffering, encoding, newline, suffix, prefix, dir):
    • Note that in this case until a parameter is incremented, it works equal to TemporaryFile() with the exception that data is stored in memory until its size exceeds max_size, or even the method fileno() is called, and then its functioning follows equal to that of the TemporaryFile().
  • io.BytesIO():
    • Returns a file-like object in memory that does not alter new lines and is similar to open (filename, "Wb"). Handles strings byte ().
  • io.stringIO():

    • Returns a file-like object in memory that alters new lines and is similar to open(filename, "w"). It handles strings unicode ().

I hope you have clarified some of your doubts. Much of what has been said has been found in the documentation I Linkei above. Hugs!

  • Yes, it was very useful. Just to clarify, the idea of the question was precisely to have this content, even if summarized, in Portuguese. I already knew the differences, but now we have an initial reference to use reference :D

  • Great way to boost the community !

Browser other questions tagged

You are not signed in. Login or sign up in order to post.