How does the compiler work in the case of a casting like this?

Question

How does the compiler work in the case of a casting like this?

Asked 10 years, 11 months ago

Viewed 84 times

1

Having the code to follow:

0    #include<stdio.h>
1    
2    int
3    main(void)
4    {
5        int x;
6        x = -3;
7        
8        for (int i = 0; i < 5; i++)
9        {
10          printf("%d\n", (unsigned int) (x - i)); //necessita mesmo de parêntesis aq?
11          printf("%u\n", (int) (x - i));       //necessita mesmo de parêntesis aqui?
12      
13          printf("%d\n", x - i);
14          printf("%u\n\n", x - i);
15       }
16 
17        return 0;
18    }

To compile I used: gcc -Wall -pedantic -std=c99 -o test.exe test.c

The output of the above code was as follows:

Doubt 1) On lines 10 and 11 the compiler converts the types temporarily to the type determined in casting and later converts implicitly to the types %d and %u or simply skip casting and format directly according to format specifier ?

Doubt 2) The unsigned int converts a int with signal for a int no sign. But after all, how does the compiler do it ? I noticed that the above code outputs when printed a negative integer formatting it to unsigned int (in the case of the above code, implicitly, in rows 11 and 14) the output is an integer value (different from the desired) that decreases proportionally to the desired value. Because this occurs ?

2 answers

1

When you call the printf passing two arguments - a string and a number - has no way the compiler knows what the printf will do with them; pro compiler, is a normal function call. So, you can’t ignore the casting - it converts the type first, and then passes as parameter to function. The fact that this function converts back later is irrelevant.

As for the form of conversion, what the compiler does is simply take the data in binary - without any modification - and interpret it in a different way. Let me give you some examples:

Sequência de bits                     Inteiro com sinal   Inteiro sem sinal
===========================================================================
00000000 00000000 00000000 00000010            2                   2
00000000 00000000 00000000 00000001            1                   1
00000000 00000000 00000000 00000000            0                   0
11111111 11111111 11111111 11111111           -1          4294967295
11111111 11111111 11111111 11111110           -2          4294967294
11111111 11111111 11111111 11111101           -3          4294967293

That is, when you have an integer variable with signal, with the value -3, internally it is represented by the bit sequence shown on the last line. This same sequence, interpreted as an unsigned integer, corresponds to the value shown in the last column.

Finally, a comment regarding whether or not to use parentheses in cast: I do not know if it is necessary or not, but I always use, even if it is to make it clear to the reader of the code my intention (even if this reader is myself, after months without touching the program). I mean, one of those two is unnecessary:

(unsigned int) (x - i)
((unsigned int)x) - i

I don’t know what it is, and I don’t even care to know... I just do it like that!

Do you know of any document that accurately portrays how the compiler works in this situation ? The table that you showed left me with a question, because I recently asked a question that encompassed this and I had as an answer something other than that. Question on similar subject

– ViniciusArruda

2014/09/04 at 00:55
By my accounts (and by assumption), when the value is signed int GCC (or is it ISO that determines how it should behave ?) stores bits as complement to 2. And when is unsigned int GCC stores direct bits without using add-on. I’m sure ?

– ViniciusArruda

2014/09/04 at 01:04
@X0R40 Both answers say the same thing, but differently. The linked answer item 2 says: "repeatedly summing or subtracting the maximum value plus 1". Now, the maximum value of a unsigned int is 4294967295. So -1 + 4294967295 + 1 = 4294967295, -2 + 4294967295 + 1 = 4294967294, -3 + 4294967295 + 1 = 4294967293... If you do the math, you’ll see the same thing goes for different-sized guys unsigned short, unsigned long etc. And as for your last comment, yes you’re right. The difference of course is only in the numbers whose first bit is 1, the rest are identical.

– mgibsonbr

2014/09/04 at 01:06
Excuse the insistence, but then how do the answers say the same thing, in fact, the compiler does all the conversion (including repeatedly adding or subtracting the maximum value plus 1") into binary ? Does this have any explanation for being like this or the compiler implementers who simply wanted to do so ? It would have something to do with modular arithmetic ?

– ViniciusArruda

2014/09/04 at 01:18
@X0R40 The specification describes how logic what happens, but in practice the compiler does not need to do anything, since the work is already done! And yes, this is precisely the motivation behind this "crazy" rule of "repeatedly adding or subtracting" - because specifying binary representation is thus preserved. And you’re also right when you say it involves modular arithmetic: see that -1 === 4294967295 (mod 4294967296), -2 === 4294967294 (mod 4294967296), etc. The representation itself in complement of 2 was invented with this in mind.

– mgibsonbr

2014/09/04 at 01:29
@X0R40 Just one more detail: that I said goes for the "normal" architectures (x86, x86_64 and others), but if someone made a C compiler for, say, a mainframe (which uses decimal representation instead of binary) then in fact the compiler would have to do some value conversion during the casting. But in binary representation, with the guy Signed represented as complement of 2, that conversion is effectively a "no-op" (or, in other words, the transformation of one type pro another is the identity operation).

– mgibsonbr

2014/09/04 at 03:10

Show 1 more comment

Browser other questions tagged c compilation formatting

You are not signed in. Login or sign up in order to post.

by pmg • **6,456** points · Answer 1 · 2014-08-31T10:11:50+00:00

Of your four printf only 1 is correct. The other three do Undefined Behaviour

   printf("%d\n", (unsigned int) (x - i)); // UB
   printf("%u\n", (int) (x - i));          // UB
   printf("%d\n", x - i);                  //
   printf("%u\n\n", x - i);                // UB

The type of value sent must match to the specified conversion format.

In the first printf specific "%d" which requires a type value int; but you send a type value unsigned int.
In the second printf specific "%u" which requires a type value unsigned; but you send a type value int.
In the third printf specific "%d" which requires a type value int; and correctly send a type value int.
In the bedroom printf specific "%u" which requires a type value unsigned; but you send a type value int.