sizeof does not work to determine the size of malloc

Asked

Viewed 231 times

2

Well, I was doing a data structure job when I came across the need to dynamically allocate a vector, however, even allocating the space needed for the structure, the value returned by sizeof is incorrect. Follow the example of the code:

int *vetor = (int *)malloc(sizeof(int)*4);//alocando espaço para 4 inteiros ou seja 4*4 = 16 bytes

printf("%d -- %d\n", sizeof(vetor), sizeof(vetor)/sizeof(int));//printando o valo

In this example above sizeof(vetor) it should return us the size of the vector, ie the amount of bytes that we intended for it, correct?

And therefore, the expected exit would be:

16 -- 4

However, this is not what happens. Follow example of the output:

Saída

And no matter how many bytes I target to the vector with malloc, the output is always the same, ie vector size in bytes is always 8.

Why this happens, and what would be the solution to this problem?

  • Concept 1: sizeof() of a pointer is always 4 in a 32-bit compiler, or 8 in 64-bit. sizeof() is solved in time compilation not at execution time. There would be no way he could inspect the result of a malloc since malloc is no longer just any function, which could be called avocado(), banana() or khaki() and accomplish the same thing. Concept 2: malloc returns a block of bytes that are not even initialized. You are responsible for their use. Although the memory manager knows how this was allocated (for free() to play its role), this information is in a private area.

2 answers

3

The calculation you’re making is only for one array with defined size at compile time. Although it is using a constant in dynamic allocation it is potentially unknown and does not work. The operator sizeof can only obtain information that the compiler can prove to be constant, and it is complicated for him to prove this, because in this case it is not so difficult to analyze, but has more complex code that you can not know this value.

So,

  • or you allocate a array on the stack where the compiler can know the size, and I see no reason not to do so (it may exist, but in general there is no need for dynamic allocation in simple cases),
  • or use the value you already know is the 4 instead of making an account that doesn’t even make sense. If you are using the code in another function you have to pass this value together for the function to know, or in some cases uses a global constant so having a name everywhere and gets more easy to change number everywhere.

Anyway, there are several ways to solve this depending on the context, which we do not know.

Just to complete, the 8 that appears there is the pointer size and not the vector size. vetor is of a pointer type, you stated saying it is, you can’t expect it to magically show the size of something else. On 64 buts architectures all pointers are size 8. A array allocated in the stack is not a pointer, it is the data and the size is known, until the compiler knows the space that needs to reserve. Dynamism is used when you don’t know the value. If you do not know the necessary value to create own control mechanisms to maintain the size, then the second option I listed above is recommended, It is even to sophisticate this, but I will not talk about why it is advanced.

  • Got it, very good explanation. But in cases where I would not know the exact value of the vector size how would I proceed? This is the case that I face, I put 4 only for symbology and facilitate the explanation of the problem, however, in the real case I do not have information about the actual vector size... Should I proceed in order to allocate a minimum size and reallocate the size with iterations, saving the index value of the final iteration as Len? or is there a better method to do so? Thank you for your attention!

  • Part of that is in the second paragraph, part is already another question, but it is on the right track, you have to control the size in the hand somehow, C has nothing ready.

1


You are stating two things, and asking a question assuming those two things are fact. However

  • is not "allocating the required space"
  • and also not "the value returned by sizeof() is incorrect"

This is the statement of vetor

int        *vetor = (int *)malloc(sizeof(int)*4);

I don’t want to get into religious discussions here, but you’re declaring vetor and vetor is int* then maybe it would be clearer to read, especially for those who are learning, if you write

int*        vetor = (int *)malloc(sizeof(int)*4);

sizeof(vetor) actually returns the size of vetor. vetor is int*, a pointer to int. And the pointer size is given by the machine architecture, 8 bytes in your case, to compile in 64 bits.

"And no matter how many bytes I target to the vector with malloc, the output is always the same, ie, vector size in bytes is always 8"

You’re right: one thing is vetor, a pointer to int. Another thing is the size of the area to which it points, which was in this case determined by the account malloc(sizeof(int)*4).

In the case of the size of the allocated area there is officially a way for you to know what it is, and the reason is simple: it was you who allocated it so you should know. And for the system it maintains an internal table of these values. See this example snippet

int tamanho = 1801;
int* mais_um_vetor = (int*)malloc(tamanho);
free(mais_um_vetor);
mais_um_vetor = (int*)malloc(130);
tamanho = 32 * sizeof(int);
int* p = (int*)realloc(mais_um_vetor, tamanho);
if (p != NULL) mais_um_vetor = p;
free(mais_um_vetor);

malloc() has no arithmetic of size: will allocate 1801 bytes and put the address in mais_um_vetor, a pointer to int. Will not allocate 1801 int! As the area size is not multiple of sizeof(int) should cancel your program when trying to access 4 in 4 if using mais_um_vetor as a vector of int

But then free() wheel ok and release the 1801. And malloc() allocates 130 to the same pointer. Maybe because I remembered it should be multiple sizeof(int) the program calls realloc() and allocates space to 32 int. And saves the total area in size.

Note that this example is just this: a meaningless example, including the program at the end.

Note that for realloc() also makes no difference the size of the area. Only the NEW size.

They were allocated 1800, then released and allocated 130. sizeof(mais_um_vetor) will not change: 8. And the system keeps record of the size of the area for when you call free() to release or realloc() to change the size, and the thing works.

However probably not what you want

You who access vetor as a vector of int, with an arbitrary number of values, dynamically allocated.

How to do this?

You can allocate the exact number, starting from N=1 and using realloc() to allocate N=N+1 each time, or can allocate in blocks of a certain number, as in blocks of 64 int, to get a little more efficient.

The problem is that realloc() He may have to move everyone around, at the system’s discretion, and he won’t tell you first. And that of course will cost your show a little time, just like that. In study programs this is not relevant, but I think you understand the problem: the block you allocated is in the middle of possible other things your program allocated, and the realloc() need a few more bytes may not have on time and then will allocate a larger area elsewhere and copy everything that had in the original area. And your show will have to wait.

Before you ask: to decrease is guaranteed that the pointer address does not change. To increase is only guaranteed that the content until the time of the increase will not change.

So what you really want to allocate is

int**        vetor;

vetor should point to a pointer vector for int, and not for a single int as the case of

int*         vetor;

This is exactly what the system does for each C program by mounting the vector argv[] with argc elements, and it is clear why it needs the argc: someone has to warn the program the size of the argument array.

Just like someone has to warn their vector for how many int it points to

An example program

allocates a 32 pointer vector to int, allocate the faces and put a value from 100 to 131 on each one. Shows the first and last value and then erases everything. And then allocate, populate and delete an array of 3 int.

Exit:

sizeof(vetor) 8
sizeof(int) 4
sizeof(vetor) = 8
sizeof(outro_vetor)  int outro_vetor[30] = 120
alocado um vetor de 32 int
Primeiro: 100 Ultimo 131
Liberando o vetor...
Liberado...
Alocando vetor de 3 int...
sizeof() = 8
3 4 5
Final...

The program:

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    int* vetor = (int*)malloc(sizeof(int) * 4);
    printf("sizeof(vetor) %zu\n", sizeof(vetor));
    printf("sizeof(int) %zu\n", sizeof(int));

    int     outro_vetor[30];
    vetor = outro_vetor;

    printf("sizeof(vetor) = %zu\n", sizeof(vetor));
    printf("sizeof(outro_vetor)  int outro_vetor[30] = %zu\n",
        sizeof(outro_vetor));

    int tamanho = 1801;
    int* mais_um_vetor = (int*)malloc(tamanho);
    free(mais_um_vetor);
    mais_um_vetor = (int*)malloc(130);
    tamanho = 32 * sizeof(int);
    int* p = (int*)realloc(mais_um_vetor, tamanho);
    if (p != NULL) mais_um_vetor = p;
    free(mais_um_vetor);

    // cria vetor_de_int apontando para intN int
    int     intN = 32;
    int** vetor_de_int = NULL;

    // passo a passo (podia ter feito direto)
    vetor_de_int = (int**)malloc(intN * sizeof(int*));
    for (int n = 0; n < 32; n += 1)
    {
        vetor_de_int[n] = malloc(sizeof(int*)); // aloca um
        *vetor_de_int[n] = 100 + n; // valores de 100 a 131
    };  // for()

    printf("alocado um vetor de %d int\n", intN);
    printf("Primeiro: %d Ultimo %d\n",
        *vetor_de_int[0],
        *vetor_de_int[intN-1]
    );

    // destroi tudo, como em C++ ao contrario 
    // da criacao
    printf("Liberando o vetor...\n");
        for (int n = 0; n < 32; n += 1)
        free(vetor_de_int[n]);
    // liberado o vetor, agora a tabela
    free(vetor_de_int);
    printf("Liberado...\n");

    printf("Alocando vetor de 3 int...\n");

    int     (*vetor3_int)[3] = malloc(3 * sizeof(int));
    printf("sizeof() = %zu\n", sizeof(vetor3_int));
    (*vetor3_int)[0] = 3;
    (*vetor3_int)[1] = 4;
    (*vetor3_int)[2] = 5;
    for (int i = 0; i < 3; i += 1)
        printf("%d ", (*vetor3_int)[i]);
    free(vetor3_int);
    printf("\nFinal...\n");
    return 0;
};

But the first is not an int vector...

Yeah. It’s a vector of int*. In practice it’s what you want. Especially in data structures. If you want to allocate a vector of 32 int you declare

int     (*vetor32_int)[32] = malloc(32 * sizeof(int));

Only there it is much less flexible: in memory it is ok, And the size is set. But only serves for 32. Fixed.

So the normal is to use a pair of variables, as the system uses, and allocate in blocks of a reasonable size, to have neither much waste nor many operations of realloc()

  • My God, what an incredible explanation! Now everything makes more sense, and therefore I must always keep the size stored somewhere and at the same time not always use the dynamism, keeping the balance between memory waste and computing waste (copy all vector from one place to another in memory due to lack of continuous space with realloc), correct?

  • Yeah. You saw at the end of the example that it is possible to dynamically allocate a pointer to a certain number of int --- 3 in the example --- or anything like float (*vetor)[32]. Only it has to be known when compiling the program and in general does not help. In your case for example you will read the value of Nwhen the program is running

  • The thing about realloc() is that you can’t always risk the program stopping at an important time, because the system is transferring GB of data in a realloc() call. A middle term is to allocate in blocks of a studied size, so as to have less waste and a minimum of realloc(). Something like using pointer blocks for a certain number of structures, depending on your data. Or sophisticate at once and use pointer pages, as the system does with memory. Without relocating anything. But often you don’t even need to know the size in the program, or even use realloc().

  • Data structures like trees, stacks and lists have nodes allocated one by one in general, for example.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.