What are Unions? Why use them within structs?

Asked

Viewed 10,342 times

18

I’d like to understand the differences between union and struct. I realized that both can be accessed in the same way:

u.membro
uPrt->membro

s.membro
sPrt->membro

In practice I have seen several codes using unions inside structs. What is the benefit of doing this? There is some improvement in performance/memory?

An example of any code (correct me if I’m wrong):

struct pessoa {
 char[50] name;
 union {
  int idade;
  float peso;
 }
};

3 answers

28


The great advantage this in the organization of memory, and in its reuse.

The variables in a struct are organized in sequential addresses, so that each variable that makes up the struct stand side by side in memory.

Your example is not a good example for a union, so I won’t use it.

Imagine we have a supermarket item. This item has a name, price and size. The size can be both in volume (1 liter) and weight (1 kg). So we could create the following struct:

struct item {
    char nome[50];
    float preco;
    float volume; // em litros.
    unsigned peso; // em gramas.
}

In this struct item, we would have memory allocated as follows (I am assuming alignment from memory to byte, for simplicity):

0-------------49-50-------53-54--------57-58-------61
      nome          preco       volume       peso

Note, however, that in the case of milk, we do not buy the milk by weight, but by volume. Therefore, the field struct item.peso would not have a valid value for this item, but would always occupy memory.

The same goes for cheese: it is sold in grams, not in litres.

How then to reduce the memory used? We can declare inside a union the fields volume and peso:

struct item {
    char nome[50];
    float preco;
    union {
        float volume;
        unsigned peso;
    }
}

Now our memory layout will stay:

0-------------49-50-------53-54-------------57
      nome          preco       volume/peso

This way, when we access the field struct item.volume, compiler knows that we are treating that memory region as a float, and will manipulate it correctly. The same goes for when we access struct item.peso, he knows he’s a unsigned, and will enforce the rules of the unsigned.

But what if we do:

struct item it;
it.peso = 2;
it.volume = 0.0f;
printf("%u", it.peso);

The way out won’t be 2, which is the value we put in the variable peso, rather the torque value of 0.0 in IEEE 754 interpreted as a unsigned. Coincidentally, this value is also 0, and so the exit will be 0.

Because?

Remember that the fields volume and peso occupy the same memory region. Therefore, the assignments wrote to the same address.

So, if we access the value by the "wrong" field, we can get absurd results for our domain of the problem. So how to know which field to use?

We can add a flag indicating this:

struct item {
    char nome[50];
    float preco;
    bool porVolume;
    union {
        float volume;
        unsigned peso;
    }
}

And so if we wanted to print the contents of an item, we could use:

if ( it.porVolume ) {
    printf("%s\t%.2f\t%.3f", it.nome, it.preco, it.volume);
} else {
    printf("%s\t%.2f\t%u", it.nome, it.preco, it.peso);
}

And this pattern repeats for when we access the fields of union.

In addition to using within a struct, we can use a union like a real guy:

union pesoVolume {
    float volume;
    unsigned peso;
}

union pesoVolume pv;
pv.volume = 0.0f;

The operation is identical, except that the union will no longer be inside a struct.

In the C language, the fields that make up the union's may be of different size, including, and the compiler will reserve memory identical to the size of the largest variable. That is:

union u1 {
    float f1;    // 4.
    unsigned f2; // 4.
}
printf("%d", sizeof(union u1)); // 4.

union u2 {
    float f1;    // 4.
    long int f2; // 8.
    char f3[20]; // 20.
}
printf("%d", sizeof(union u2)); // 20.

When to use?

Today it doesn’t make much sense, I believe. In the past, memory was an abundant resource, and so it justified making these savings. Today, the standard of a PC is 4GB, and it is not uncommon to find machines with 8GB or more.

In some cases, however, union facilitates the passage of parameters in an API and can be used if the programmer identifies the advantage. This occurs in some commands of the Win32. I sincerely do not recommend, because it may be that some languages do not have support for this organization, causing interoperability problems.

Why didn’t I use your example?

Why we would probably like to keep both the information from idade of a person, as of his peso.

  • Thanks for the reply. I found it very detailed and so I gave a +1

  • If the answer clears your doubts, you must accept it.

  • I expected to see more answers to see if there was one more complete. I accepted yours because as I mentioned before answered the question.

  • 1

    Excellent answer, @Viníciusgobboa.deOliveira. Not in books have I seen such a good explanation, congratulations!

7

The difference between Union and struct is that a struct is an "E" and stores all fields while a Union is an "OR" and all fields are in the same memory position. If you update a Union field all other fields will also be updated at the same time, for some junk value.

The only case where you should use a Union is when you want to save memory and are sure that only one field is needed at a time. As for the fact that Unions appear within structs, a problem of Unions in C is that there is no way of knowing which field is being used and which fields have "junk" values. Therefore, it is common to create an Enum to mark this. For example, this struct represents tokens in a programming language:

struct exp {
    enum {LIT,VAR} type;
    union {
        int lit;
        char *var;
    } value;
};

A token can be either a number or a variable name. In the field type we say what is the type of token and in the value field we store the value of the token (an integer in the case of being a numeric token and a pointer to string in case the token is an identifier). Using Union, the struct exp has a more compact representation in memory. Just take care to only access the field lit after jacking that field type contains LIT and so on.

  • I liked the comparison of E for structs and the comparison of OR for Unions.

3

Vinícius Gobbo and hugomg have already given explanations about what is and the difference of a struct. However, I would like to add an example of.

union Valor
{
    uint32_t dword;

    struct
    {
        uint16_t word0;
        uint16_t word1;
    };

    struct
    {
        uint8_t byte0;
        uint8_t byte1;
        uint8_t byte2;
        uint8_t byte3;
    };
};

Which may be represented as follows:

Representação da Memória

In the image, you can see that word0 and word1 are parallel to dword, as well as bytes.


Still in this example, it is possible to perform operations of the type:

Valor foo;
foo.dword = 305;

printf("%d", foo.word0); // Mostra a primeira word de dword.
printf("%d", foo.word1); // Mostra a segunda word de dword.

printf("%d", foo.byte0); // Mostra o primeiro byte de dword.
  • Nice example, but the compiler will have to give some assurance that the memory is aligned correctly not?

  • Kaminary, yes, have to check this yes. And also the sort of ordering, whether it is big endian or little endian. But, in those I have used this is valid.

  • -1 for placing a dangerous code and possibly Undefined behavior.

  • @Kaminary: Compiler vertices implement pragmas that allow marking a struct as "packed", with no padding between fields. Of course, this is an extension of the language - the base pattern only ensures that the struct fields appear in order.

  • @hugomg In this case I can assume that the Lucas Nunes code will work on any ISO standard compiler that does not have this extension?

Browser other questions tagged

You are not signed in. Login or sign up in order to post.