Package strings using struct

Asked

Viewed 1,040 times

2

I’m doing a basic exercise using the module struct, and I came across a problem: To pack a string, we should inform in the method struct.pack() the number of characters it has, right? But what if this string is informed by the user? In this case, I don’t know how many characters it will have, so how can I package it?

  • I’m not sure how you’re using the struct, but to see the size of a string you can use the built-in len(). Len('abc') == 3.

1 answer

2


See, how do you create a string in C? Unless you declare and initialize at the same time, you need to enter the size of the string. Right?

Therefore, you should also do the same when packaging a string with struct, because, after all, the struct module does Python byte conversions for structs in C and vice versa. By the way, if I’m not mistaken, there’s no way to declare a no-size string as a member of a C struct.

The size limitation makes more sense if you think that later someone will need to unpack this struct, and therefore it should know its size, because otherwise you can do a wrong reading by mixing bytes of one data with those of another data.

Now, about your problem, if in fact you can’t limit user input (something that should be done, for security), you could do something like this:

str_ = input()

bytes = str_.encode()
tamanho = len(bytes) + 1

formatacao = "{}s".format(tamanho)
pacote = struct.pack(formatacao, s)

Remember that the size of a string may differ from its representation in bytes:

>>> len("ç")
1
>>> len("ç".encode())
2

Also, add 1 to the size obtained, so the string can be finished properly with a \0 (reminds of C strings?).

If you can persist the format string, you can unpack your data smoothly.

However, a better approach, since this package can be read in another program, is for you to package the string together with its size in bytes:

formatacao = "i{}s".format(tamanho)
pacote = struct.pack(formatacao, tamanho, bytes)

Thus, the person reading your struct, will know that the first value is an integer that will tell how many bytes forward correspond to the string that was stored before. The idea is to have a header with the information needed to read the data and a body with the data to be read.

Did it make sense? Follow a full example:

Packaging

>>> entrada = input()
sabão
>>> bytes = entrada.encode()
>>> tamanho = len(bytes) + 1
>>> formatacao = "i{}s".format(tamanho)
>>> pacote = struct.pack(formatacao, tamanho, bytes)
>>> pacote
b'\x07\x00\x00\x00sab\xc3\xa3o\x00'

Unpacking (with persistence of the format string)

>>> tamanho, bytes = struct.unpack(formatacao, pacote)
>>> str_ = bytes.decode().strip('\0')

Unpacking (no persistence of format string)

First, we calculate how many bytes an integer has, so we can measure how many bytes our package will consume:

>>> fim_int = struct.calcsize('i')
>>> tamanho_str = struct.unpack('i', pacote[:fim_int])
>>> tamanho_str = tamanho_str[0]  # unpack sempre retorna uma tupla, por isso o [0]

Then we determine the size of the string with the first catch and calculate the bytes that will be consumed:

>>> formatacao = '{}s'.format(tamanho_str)
>>> inicio_str = fim_int
>>> fim_str = inicio_str + tamanho_str

Finally, we take and treat the string:

>>> bytes = struct.unpack(formatacao, pacote[inicio_str:fim_str])
>>> str_ = bytes[0].decode().strip('\0')

Obviously this example has some odd things in the idiomatic issue of Python, but I tried to leave as didactic as possible.

  • Thank you very much for your reply! I understood how to do it, but one question remained: You told me to limit the user input (I hadn’t really thought about it), so let’s assume that I limit the input to 10 characters and the user to 4 characters. This will give some problem when packaging/unpacking?

  • Ah, another thing... I’m having trouble packing... The way you said, I’ll only be able to package the number of characters, I’m not able to include the string in the process...

  • 1

    No problem, since the rest will be filled with \0, which is a null character. However, when you unpack and convert to string again, the size of it will be 10 instead of 4, fitting you some manipulation (ex. strip('\0')). As for the other comment, I tested the code and everything is ok, I will put a full example soon.

  • Now yes, with this example I understood (it took a while, because I’m still new in the subject and I’m not very familiar with the method format, always use the %) but it worked, I was able to pack all the data I wanted in the program. Thanks!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.