Read binary file using struct created in C++

Asked

Viewed 146 times

0

Good afternoon to all,

I would like to read a binary file created in C++ using the following structure:

struct STRUCT_MOB
{
    char           MobName[NAME_LENGTH];      // The name of the mob // 0 - 15
    char           Clan;          // The clan the mob belongs to // 16
    unsigned char  Merchant;      // The mob's merchant ID // 17
    unsigned short Guild;         // The ID of the guild the mob belongs to // 18 - 19
    unsigned char  Class;         // The mobs class // 20
    unsigned short Rsv; // 21 - 22
    unsigned char  Quest; // 23

    int            Coin;          // The ammount of coins the mob has // 24 - 27

    long long      Exp;           // The ammount of experience the mob has to level up // 28 - 35

    short          SPX;          // The Y position saved by the stellar gem, to teleport the mob there when using warp scroll // 36 - 37
    short          SPY;          // The Y position saved by the stellar gem, to teleport the mob there when using warp scroll // 38 - 39

    STRUCT_SCORE   BaseScore;    // The base score of the mob // 40 - 87
    STRUCT_SCORE   CurrentScore; // The score the mob actually has // 88 - 135

    STRUCT_ITEM    Equip[MAX_EQUIP];     // The items the mob is wearing // 136 - 263
    STRUCT_ITEM    Carry[MAX_CARRY];     // The items the mob is carrying // 264 - 775

    long LearnedSkill; // The skills the mob learned, divided into four categories (00 _ 00 _ 00 _ 00) // 776 - 779

    unsigned int Magic; // 780 - 783

    unsigned short ScoreBonus;   // The points the mob can use to increase score (Str, Int, Dex, Con) // 784 - 785
    unsigned short SpecialBonus; // The points the mob can use to increase special, to increase effect of learned skills (score->Special[4]) // 786 - 787
    unsigned short SkillBonus;   // The points the mob can use to buy skills // 788 - 789

    unsigned char  Critical;     // The chance the mob has to deliver critical hits // 790
    unsigned char  SaveMana;     // Uknown use, nomenclature of variable is correct to all current standards // 791

    unsigned char  SkillBar[4];  // The skills saved on the first 4 slots of the skill bar // 792 - 795

    unsigned char  GuildLevel;   // The mob's guuld level, used to define if it's a guild member or leader // 796

    unsigned short  RegenHP;         // UNK // 797 - 798
    unsigned short  RegenMP;         // UNK // 799 - 800

    unsigned char  Resist[4];    // The mob's resistencies, to fire / ice / thunder / magic // 801 - 804 // 805

};

Based on this struct, how would read such a binary file in python? I saw something using unpack, but I couldn’t find a way to define the struct of the file for reading, as done in C++.

1 answer

1


Struct in C or C++ defines the fields of your data object, and for each field, not only the name, but the exact type of data - and therefore its size in bytes. If the code writes the structure as it is in the memory on the disk, you will have that sequence of bytes on the disk.

If it is a structure with pointers to other structures, or even to text strings (char *) that would be at other points in memory, recording and recovering that from disk becomes a big task alone. But, the question, has in fact all the "local" data in a single piece of memory. However, it has other structs used internally (STRUCT_SCORE and STRUCT_ITEM), as well as sequences of declared size with a name (NAME_LENGTH, MAX_EQUIP and MAX_CARRY)

Well, when you read a file with these Python structures, or any other language, you get back a set of bytes. Know how to "see" within these bytes and know which one corresponds to which field of the original has to be programmed.

In short - to read this in Python you will have to redeclarate the structure, and all nested structures, in the Python code - there’s no escape, and possibly some little code around to make access to the fields easier. (for example, the last field that is unsigned char Resist[4]; - for it to be read and written as 4 characters in Python you will need some transformations

**Before continuing ** depending on your goal, maybe moving the structure back and forth in binary isn’t the simplest thing to do. You might consider having a C library that records struct as "json," for example, in this case, reading in Python and several other languages becomes simpler. There are also other data exchange protocols that allow Voce to declare the data structure once and use in various languages - Google protobuffers and Cap'n'proto are two examples.

I’ll pass a path of stones in Python - but to make it work is not so simple - you’ll have to sweat a little there (or we combine a live and do live).

Standard library "struct" package

That being said, in Python, there is the "struct" module, and it, in fact, can decompress a sequence of bytes in several Python objects - that strike a sequence, from which they can be used. If used in conjunction with a collections.namedtuple you can have read access to all fields by name - but you will have to declare the name and size (in bytes) of each field in two different places.

So just as an example, a struct like this:


struct STRUCT_MINIMOB
{
    char           MobName[15];      
    char           Clan;          
    unsigned char  Merchant;      
    int            Coin;
} 

That it is written in a "minimob.bin" file will be read from Python like this:


import struct
from collections import namedtuple


MiniMob = namedtuple("minimob", "mobname clan merchant coin")

mob_recuperado = MiniMob(struct.unpack("=15scBi", open("minimob.bin", "rb").read()))

And you could see - at least the beginning of the name, seeing the field mob_ recuperado.name in Python at that point. The fields coins and merchant also would work (but the "Merchant" will be an integer from 0 to 255, which will have to be saved and read separately, in another struct, there you can read and put in a dictionary in which that integer from 0 to 255 is the key, etc... ). The key to this reading is the string "=15scBi" in the call to struct.unpack - if you look at the struct documentation, you will see that this encoding corresponds to "use native byte order, a string of characters of size 15, a character, a number with signal from a byte and an integer of 4 bytes" .

Coo you can see, this is hard to do, prone to mistakes, and even harder to keep right. - and more, fields with nested structures (Equip, Carry), would have to be read as a sequence of bytes and the process would have to be repeated for its content.

Using ctypes

Python has another native way of reading binary structures, which are the Structure class of ctypes. This form is much less known than the normal "struct" - but it is easy to see that except in cases of up to 10 very simple fields, the "struct" alone does not help much.

The biggest drawback is that Structure dp ctypes was made to reproduce C structs in Python, yes, but it doesn’t have an intuitive way to create such a structure from raw data (bytes read from a file). Since the ctypes module itself provides tools that are not normally used in Python - direct access to memory by addresses (pointers), the business is to use "a cake recipe" with that ready - and then yes, you can have something programmatically functional (just note that the text in "C" as it will appear in the ctypes Structure class is "bytes" - and make the appropriate conversions with Encode/Decode.

The syntax for declaring a structure using ctypes. Structure is given here: https://docs.python.org/3/library/ctypes.html#Structures-and-Unions

(these examples make a from ctypes import * which is uncommitted - it is better to do import ctypes and place prefixes in Structure, c_int. Only the statement of the structure summarized as above would be:




import ctypes 

class MiniMob(ctypes.Structure):
    _pack_ = 1
    _fields_ = [
        ("name", ctypes.c_char * 15),
        ("clan", ctypes.c_char),
        ("merchant", ctypes.c_ubyte),
        ("coin", ctypes.c_int32),
    ]

(The _pack_ = 1 ensures byte to byte alignment, otherwise ctypes will play the "Coin" for a bytes forward and break its whole structure)

And, as I said, this allows a class very close to a C struct, where you can manipulate the fields one by one - but to access the contents as a byte sequence, I adapted a recipe from here: https://stackoverflow.com/a/1827666/108205


import ctypes

class MiniMob(ctypes.Structure):
    _pack_ = 1
    _fields_ = [
        ("name", ctypes.c_char * 15),
        ("clan", ctypes.c_char),
        ("merchant", ctypes.c_ubyte),
        ("coin", ctypes.c_int32),
    ]
    @classmethod
    def _load(cls, data):
        self = cls()
        size = min(len(data), ctypes.sizeof(self))
        ctypes.memmove(ctypes.addressof(self), data, size)
        return self
    def _dump(self):
        return memoryview(self).to_bytes()

And with that you can play the read bytes of a file straight to the method load - below, in interactive mode, I create the bytes that would be in the file for these fields by concatenating some objects of type "bytes":


In [150]: dados = b"Batman" + b"\x00" * (15 - len("batman")) + b"A" + bytes((100,)) + (1000).to_bytes(4, "little")                   

In [151]: mob = MiniMob._load(dados)                                                                                                 

In [152]: mob.coin                                                                                                                   
Out[152]: 1000

In [153]: mob.name                                                                                                                   
Out[153]: b'Batman'

Another advantage of this approach is that you can place the nested structures as you do in C: declare them separately and put them in _fields_ where they will be used.

Other forms

Other ways to do this I believe are using automatic "Wrappers" creators from C to Python. If you use "cython" for example, you will have to declare the structure again, using the cython syntax, but I believe it already thickens you, after a call from Python, a ready-to-use object as in the case of ctypes.

Yet another way is to create a Descriptor class of your own - something that will function similar to Python’s "Property" - and you save the raw,l data from the file, in an object of type "bytearray" in its instances - and allows each Scriptor to read and write at the right positions.

Since your C struct already has the positions in bytes annotated for each field, this could be a good alternative. I do something similar in a project of mine - this pde get very cool, and well "foolproof", but it will require a more advanced knowledge of Python a little - But you can use the file code "base.py" here for this already:

https://github.com/jsbueno/pythonchain/blob/master/pythonchain/base.py

And here are the classes, some very advanced, making use of what is defined in "base" to input and output compressed data - as in the case of struct in C: https://github.com/jsbueno/pythonchain/blob/1f9208dc8bd2741a574adc1bf745d218e4314e4a/pythonchain/block.py#L45

These classes derived from "base. Base" have the methods from_data and serialize. Actually, maybe using the base.Basis of this project is the simplest of all I’ve said.

  • 1

    My answer all takes into account only Python3 - I don’t think it’s productive to document new code development forms in Python 2 at this point. Anyway, only the "_dump" method wouldn’t work there - the memoryview is Python 3.

  • Excellent explanation!!! Thank you too much all the information contained, I will surely achieve the goal I hope with this your description. Thanks!!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.