How to update lines with open() in python?

Asked

Viewed 832 times

-1

the Python open() method has functions to read and write but there is no update, as I can do?

1 answer

2


The very call to the method write can update text files - what happens is that if the file content is "text", it is virtually impossible to change its content in a way that is useful - it is best to re-write the whole file. Here I explain well why: How to delete an entry from a Python file without having to read the entire file?

Now, yes, to change the contents of an existing file without rewriting the whole file, the key is that the file has to be opened in mode "r+b". Despite the "r", normally used for reading, the "+" character changes the mode to "read and write". (could be without the "b", but it doesn’t make sense for text files in which you don’t know a priori the size that a string will have in bytes in the file - who will do this has to have complete control over each written byte and this is only possible if you encode the content manually - open files in text mode have a transparent text encoding step in bytes that can change the size of the string).

The key to the changes is to know that given a file opened in Python, it exists as an "object" in the operating system, and the operating system itself maintains a "pointer" to the position of the file where it will be written. Next to Python, or any other language, besides read and write, a program can change the position of that pointer using the method .seek. Once the pointer is at the exact point where you want to write, you can use the write, and the number of bytes you write goes over-write that exact much of bytes in the file.

A quick example, let’s write the bytes that correspond to the uppercase letters "ABCD" in a 4-byte file, and change only the third letter "C" to "E" and read again:

open("teste.bin", "wb").write(b"ABCD")
print(open("teste.bin", "rb").read().decode())
arq = open("teste.bin", "r+b")
arq.seek(2)
arq.write(b"E")
arq.close()
print(open("teste.bin", "rb").read().decode())

When I glue these lines to their immediate execution in interactive mode, we have:

>>> open("teste.bin", "wb").write(b"ABCD")
4
>>> print(open("teste.bin", "rb").read().decode())
ABCD
>>> arq = open("teste.bin", "r+b")
>>> arq.seek(2)
2
>>> arq.write(b"E")
1
>>> arq.close()
>>> print(open("teste.bin", "rb").read().decode())
ABED
>>> 

So take a look at the "Seek documentation" - https://python-reference.readthedocs.io/en/latest/docs/file/seek.html to do more experiments, but I emphasize that you should not use this approach in simple programs.

Why not do that

Even if it is possible to change a single, or a few, bytes in an existing file, it is preferable to recreate it in its entirety.

The two main reasons are: it is difficult to know which length text It will be in a file, as I explained above. But mostly there is no resource gain or efficiency in doing this: you don’t save time, nor wear less your hard drive (or SSD): even though you change a single byte in a file, the smallest write drive to the hard drive that the operating system uses (whether Windows, Linux, Mac or other Unix) is a "block" - and these blocks generally have 4KB (4096 bytes). That is - even in the writing of a single letter "E" in my example above, even if Python, and Python Runtime (the native code part, written in C) have manipulated a single byte with the code of the letter "E", when the call to "write" was madethe operating system must perform these steps:

  1. reads the entire file block where the byte is going to be changed (4KB)
  2. changes that single byte in memory
  3. allocates a new block in the file system
  4. writes the whole block (4KB - being 4 bytes of information and 4092 bytes of garbage, in this case) in the new position
  5. changes the file system tables so that the file uses the new written block
  6. returns the program control for Python Runtime.

All this happens transparently to the programmer, even if he was a C programmer.

And finally, there is a third reason not to want to change an existing file: it is not possible to insert content in a file - you can only over-write bytes that are already there. If it is a file with 4 lines of text, and you want to insert a second line between the first and the second (orignal), you have no system call to "push" the rest of the file forward, and put your new bytes in that position.

How to change the contents of a file

The correct approach is to great majority of cases is:

  1. read the entire file to the computer memory
  2. Change the contents of the file in memory (here you have several different ways to insert content, exchange, over-write, etc...)
  3. create a new file and save the modified content.

Of course, in today’s computers (and even those of 25 years ago), writing thousands of bytes in a new file is very efficient - so there’s no loss of performance in taking this approach.

If you want more security for code in production, you can at the time of writing the file instead of creating a new file with the same name:

  1. create a file with a different name
  2. save the new content
  3. verify that the recording occurred smoothly
  4. delete the old file
  5. rename the new file to the old name

(That’s pretty easy to do. For example, the step of "check if everything went well with the recording", in Python is automatic - if there is any problem in the recording Python will generate an exception and your code will not even try to delete the old file. The calls os.unlink and os.rename module os allow you to delete the old file and rename the new file).

The previous example looks like this:

import os
nome_original = "teste.txt"
arq.open(nome_original, "wt").write("ABCD")
conteudo = arq.open(nome_original).read()
novo_conteudo = conteudo[0:2] + "E" + conteudo[3:]
arq.open("nome_novo", "wt").write(novo_conteudo)
os.unlink(nome_original)
os.rename("nome_novo", nome_original)

Why don’t you need even change an existing file

Whatever your application, if you’re treating hundreds of thousands (or millions of times more than that) of data, and you want to keep it organized, you’re not going to be inventing this organization from scratch, in a file of your own: in general you’re going to use a database - who already does just that - organizes this data as efficiently as possible. If people who have been developing the database system for decades have resolved that for some situations it is advantageous to change the contents of a file without re-writing it, the code for it is inside the database, not in your system.

Similarly if you’re working with large files like image or video: you’re going to be using libraries to access the content of these files - these libraries do the abstraction for you - all you see is the uncompressed image data in memory.

For small text files (and in this case, "small" is up to 50MB), the approach of reading everything to memory and recording everything back works perfectly. (But, if you have more than 100KB of data in text file, it’s time to think about migrating your information to a database system anyway)

  • vlw, I was trying to use the open as a mini database since I’m starting agr in python, I’ll try some option like sqlite

Browser other questions tagged

You are not signed in. Login or sign up in order to post.