There are two things you need to understand when dealing with binary files like this:
Read X amount of bytes with read
advances the reading position of the X file positions. It happens every time you call read
.
seek
sends you to the position you passed. Then a seek(0)
sends the read position of the file back to the beginning.
Inside the binary file, it only has bytes, and they’re all lined up.
In your case, for example: the first four bytes represent an integer number that indicates how many pairs of int and float the file contains, followed by four of the first int, four of the first float, and so on.
Suppose our file has 2 pairs. The binary will be something like this:
>0010 iiii ffff iiii ffff
Where 0010 is integer 2 in binary, iiii
represents a 4 byte integer, and ffff
a 4 byte float. The arrow >
represents the read position of the file. When we open the file, it is at position 0, first of all.
Let’s take a look at your code:
with open('valores.bin', 'r+b') as arq:
n = struct.unpack('i', arq.read(4))[0]
arq.seek(0)
for i in range(n):
arq.seek(0)
if isinstance(struct.unpack('i', arq.read(4)), int) and struct.unpack('i', arq.read(4)) < 10:
arq.write(struct.pack('i', 0))
elif isinstance(struct.unpack('f', arq.read(4)), float) and struct.unpack('f', arq.read(4)) > 9.0:
arq.write(struct.pack('f', 1000,0))
The first problem is that before you enter the loop, you send the read position back to the beginning of the file, but that’s not what we want to do. After we read the first integer and started to know the size of the file, there is no more reason to read these first 4 bytes. The seek(0)
is unnecessary.
You mean after the n = struct.unpack('i', arq.read(4))[0]
, as we gave the read
, the reading position is that:
0010 >iiii ffff iiii ffff
We are already in position to start reading the values. If we give seek(0)
, back to first position:
>0010 iiii ffff iiii ffff
And we’re no longer interested in reading the 0010
, because we already know that the file has 2 pairs of values.
From there you can also see some more problems inside the loop:
We give the seek(0)
at the beginning of each iteration. So, not only do we go back to the first that doesn’t interest us, but we never go forward in the next iterations and even if we didn’t have the first 0010
we would always read the first pair of values.
We give arq.read(4)
several times without keeping the value. Remember that each read(x)
advances the reading position in x
, then we can only call read
once before it goes to the next item. It is best to save the result of arq.read(4)
in a variable to avoid having to read the same value twice.
We check if the result is int
after we’ve had him interpreted as int
. When we call struct.unpack
with the argument 'i'
, we are saying to interpret those bytes as integers and it will return us a whole anyway. The problem is that if we interpret a float as integer, the value of the int has nothing to do with the value of the float.
What I recommend is to first make the most basic work: let’s read the file and make sure the positions are correct:
with open('valores.bin', 'r+b') as arq:
n = struct.unpack('i', arq.read(4))[0]
print(n)
for i in range(n):
meu_inteiro = struct.unpack('i', arq.read(4))
print(meu_inteiro)
meu_float = struct.unpack('f', arq.read(4))
print(meu_float)
# Resultado: 3 (2,) (2.5,) (12,) (12.5,) (1337,) (314.70001220703125,)
In my case, the values I put were those, so everything right so far. Note that we do not use the seek
still, because it is not only necessary for sequential reading. We will only need it to overwrite the values. I mean:
We read the first figure iiii
and put it into the variable meu_inteiro
.
0010 >iiii ffff iiii ffff
->
0010 iiii >ffff iiii ffff
We compare meu_inteiro
(without making another read
) with some value. If it is less than 10, we return the positions necessary to exchange it for -1:
0010 iiii >ffff iiii ffff
-> (seek pra voltar à primeira posição)
0010 >iiii ffff iiii ffff
-> (escrita de novo int -1)
0010 iiii >ffff iiii ffff
(procedemos com a leitura do float)
The seek
has 3 operation modes, defined by the second argument. The first mode and default is to set the absolute position of the read/write position of the file. I mean, do seek(4)
arrow the reading position at byte 4. If we pass the second argument as 1, then the position is relative to the current position. I mean, seek(4, 1)
puts the position 4 bytes ahead of the current position; if we are at position 4, it goes to 8. The third mode, passing 2, is relative to the end of the file, but this does not matter to us.
Since we want to go back 4 bytes if we are going to write, we should use seek(-4, 1)
.
Then your code would look like this:
import struct
try:
with open('valores.bin', 'r+b') as arq:
n = struct.unpack('i', arq.read(4))[0]
print(n)
for i in range(n):
meu_inteiro = struct.unpack('i', arq.read(4))[0]
print(meu_inteiro)
if meu_inteiro < 10:
arq.seek(-4, 1) # Voltar à posição do iiii que deve ser sobrescrito
arq.write(struct.pack('i', 0))
meu_float = struct.unpack('f', arq.read(4))[0]
print(meu_float)
if meu_float > 9.0:
arq.seek(-4, 1) # Voltar à posição do ffff que deve ser sobrescrito
arq.write(struct.pack('f', 1000.0))
except IOError:
print('Erro ao abrir ou ao manipular o arquivo.')
Out of curiosity, it’s a requirement to use
struct
? 'Cause it would be a lot easier withpickle
, if not.– Pedro von Hertwig Batista
Hello Peter, I need to do with struct same, friend I am in a fight with this code I can not make it replace to values, not sang understanding how Seek works, I tried Seek(0,0), Seek(4,0) but I can not.
– Bruno
I’m writing an answer here to try to help!
– Pedro von Hertwig Batista