How to assemble a list of generic objects in C?

Asked

Viewed 498 times

5

In many higher-level languages it is possible to have a structure or a collection of data of various types, often the type is used Object for this.

How to do the same in C? I mean, I don’t know the types that will enter the structure or array, could be anyone who’s right.

2 answers

5


Most likely to use a void *. This will cause any object to be placed on the object in question (a structure or a collection) as a reference, i.e., it will be a pointer. This way you even the type that will always have the size of the pointer.

The downside is that all the data will be pointed out. This can be bad for the types that are usually by value (int, char, double, etc.) since there will be a indirect to access the data (access the pointer and then go where the value is) and in addition to occupying the space for the object, it is very likely that you will have to allocate it in the heap. It will be a mistake in most cases just to pick the pointer to the value in stack. This works if it does not escape out of the current function (can use in the functions called there, but not by the caller, is confused about this give a smooth as the stack works).

There is also another possibility that can optimize this, although it can give a certain complexity in the code. We can use a union with the climbing types and probably a void * to others. It will always occupy the space of the largest type, the types by value (scalars) will be used by the same value and only the types by reference will have the indirect, which would already have anyway.

Actually to get more flexible it will probably take up an extra space to indicate which type is stored there. This is called tagged Union.

Note that what you are doing is making C a dynamic typing language. In fact this is how dynamic typing languages usually work.

See in the two executions below on different machines that the structure size is different.

#include <stdio.h>

typedef struct {
    enum { is_int, is_float, is_char, is_pointer } type;
    union {
        int i;
        float f;
        char c;
        void *p;
    } value;
} Tipo;

int main(void) {
    int x = 10;
    float y = 5.5f;
    char c = 'h';
    char a[] = "teste";
    Tipo var1 = { .type = is_int, .value.i = x };
    Tipo var2 = { .type = is_float, .value.f = y };
    Tipo var3 = { .type = is_char, .value.c = c };
    Tipo var4 = { .type = is_pointer, .value.p = a };
    printf("%d\n", var1.value.i);
    printf("%f\n", var2.value.f);
    printf("%c\n", var3.value.c);
    printf("%s\n", (char *)var4.value.p);
    printf("%d\n", var2.type);
    printf("%zd\n", sizeof(var2));
    printf("%d\n", var3.type);
    printf("%zd\n", sizeof(var3));
}

Behold working in the ideone. And in the repl it.. Also put on the Github for future reference.

3

So - C is a statically typed language, with no static support for objects. on the other hand, it gives you control over almost everything you want to do, and provides a kind of "generic" data - which is precisely the pointer of the void type (void *). When you declare a type variable void *- All the compiler "knows" is that it is a memory address - and your program will be solely responsible for manipulating the data in that memory region.

So, it is not possible to have a "generic" object that uses "Structs" predefined in C dynamically- you cannot pass a type of struct as a parameter to a function.

This means that you will have to structure your object types so that they have some fixed fields, at the beginning of the data structure, that describe the layout of the data in the later sessions - (including the size).

For example, you could define, in "English" itself, that for your objects, the first two bytes will be a 16-bit integer defining the length of a string, in which each byte corresponds to an ASCII character defining a field - type "B - Unisgned char, b - Signed char, I 32bit unsigned integer" - "L 64 bit unsigned integer", "Z 16bit size prefixed string". And then you write functions that handle data with this formatting as described. Note that this is independent of whether you define this object header as a struct itself, or simply use pointer arithmetic, within its functions, to allocate the required memory and manipulate the attributes of your objects dynamically.

For the type of object I described above, we could have this function to create new objects, allocating at runtime the necessary memory:

#include <stdlib.h>

void *create_object(char *definition) {
   short unsigned int size = 2, def_len=0;
   void *new_obj=NULL;

   for (int i = 0; definition[i]; i++) {
      def_len ++;
      size += 1;
      switch (defintion[i]) {
         case 'B': size += 1; break;
         case 'I': size += 4; break;
         ...
      }
   }
   new_obj = malloc(size);
   if (!new_obj) {return NULL;}
   (short integer *)(new_obj[0]) = def_len;
   for(int i = 0; definition[i]; i++) {(char *)(new_obj[i]) = defintion[i]}
   return new_obj;
}

(A function to manipulate the fields themselves, within that reserved memory, would have to go checking each character of the definition string to know the position of each field, when accessing a field by its numeric index):

int get_field_offset(void *obj, int field_num, char *type) {
    int field_offset = 2 + *((short int *)obj);
    for (int i = 0, j = 0; i < field_num; i ++) {
         char field_type = (char *)(obj[i + 2]);
         if (i >= *((short int *)obj)) {return 0;}
         switch (field_type) {
             case 'B': field_offset += 1; break;
             case 'I': field_offset += 4; break;
             ...
         }

    }
    type[0] = field_type
    return field_offset
}

void set_field_value(void *obj, int field_num, void *value) {
    char *type[1]=0;
    int offset;
    offset = get_field_offset(obj, field_num, type)
    if (!offset) return;  // field does not exist
    switch (type[0]) {
        case 'B': *(char *)(obj[offset]) = value;
        case 'I': *(int *)(obj[offset]) = value;
    }
}

void * get_field_value(void *obj, int field_num, char *type) {
    int offset;
    offset = get_field_offset(obj, field_num, type)
    if (!offset) return NULL;
    // Return the address of the  exact field, and its type indication on "type"
    return &(obj[offset]);
}

So, realize that with this you can manipulate different data structures, which change at runtime, and you don’t even need to use the keyword "struct" of C. You can even use an object definition that comes from a data entry - be the user typing, be reading from a text file.

void *coordenadas = create_object("ff")
set_field_value(coordenadas, 0, 23.0);
set_field_value(coordenadas, 1, 45.23); 
...

(For that, just put f as being float or even double in the switch cases above) - and you can save latitude and longitude in these objects.

This is a "very crude" form - and it would take a lot of work to accommodate variable-length data types in there. But you could sophisticate as much as you wanted, for example by adding a field to count how many references there are to the object (thus, whenever an excerpt of the code no longer needs an object, decreases one of the reference counter - if that counter reaches zero, the object can be immediately displaced, releasing the memory). Another interesting sophistication is to include a table of strings that would allow, for example, giving textual names to fields. Of course, the C code gets proportionally more complex.

Various object systems, or generic data protocols, are written in pure C, and they all have to start more or less from these principles (det er fixed fields at the beginning of the data that determine the layout of the whole object)- the "gobject" framework, for example, the "protobuf" of Google, the Cap'n'Proto and the Python programming language itself - of which, all objects have a memory representation that can be used from the C language well made in those terms. (In general, these initial fields that define the layout of an object are not visible if you access the object from Python code, but are there if you access the objects from C). The definition of objects in Python has to be included in any extension in Cque will handle Python objects, for example, and in order to manipulate generic objects, it uses the types (typedefs) defined in the file Object. h - Look at this file, near the 112 line.

  • Very good, if no better appears, this will be accepted ;)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.