Is a "char" guy really a size 1?

Asked

Viewed 1,179 times

8

I always knew that a char is the only type that has fixed size by specification. Its size is 1, no matter the architecture.

But I came across sizeof('a') returning 4 and not 1.

How is that? I learned wrong?

1 answer

9


You have learned by half. In fact when you ask sizeof of a variable to type char, or the type itself char, the result will always be 1. It will never change, so there is no reason to use an expression to get its size. Use literal 1 and that’s it.

It can say "through doubts", "through conscience", "will one day change". It will not change, language specification cannot change because someone wanted to. Programming cannot be based on beliefs. It has the right and the wrong. At most, style can have taste. If you like to use sizeof(char) I can only regret it because it’s taste. But to think that it may not be 1 is nonsense.

In fact if you make a sizeof of a literal character will result in the size of a int. For me it was a language error to specify this and I see no use. Nor use this form, no need, use 1.

It is so wrong that C++ thought it best to make it incompatible and the language results in 1 even.

In C:

#include <stdio.h>

int main() {
    char a = 'a';
    printf("%d\n", sizeof(char));
    printf("%d\n", sizeof(a));
    printf("%d\n", sizeof('a'));
}

Behold working in the ideone. And in the repl it.. Also put on the Github for future reference.

In C++

#include <iostream>
using namespace std;

int main() {
    char a = 'a';
    cout << sizeof(char) << "\n";
    cout << sizeof(a) << "\n";
    cout << sizeof('a')  << "\n";
}

Behold working in the ideone. And in the repl it.. Also put on the Github for future reference.

  • 1

    A question, and if it is a special character, not ascii? for example "â", will it still have 1 byte? Because sizeof(a) has a different size than sizeof('a') in C?

  • @Miguel That’s another question, but come on: char only has 1 byte, no matter the encoding, ASCII, Latin1, etc. If using any multi-byte encoding on it is an error, actually every byte of that supposed character from the point of view of char will be a mistake. Of course nothing prevents you from having functions that interpret this correctly, but it is outside the standard, what exists in C and what third parties will probably do, does not work. If it needs multi-byte, it’s another type. The second part, I think, is in the answer, right? It is by specification that the literal is different and that I consider a mistake.

  • Understood, bigown I can ask the question because I think many like me will have that doubt "what if it’s a special character, not ascii?" coming from PHP and knowing so many "problems" that come from this, ex: strpos vs mb_strpos: http://stackoverflow.com/questions/13913411/mb-strpos-vs-strpos-whats-the-difference

  • @Miguel be at ease. Never need to ask permission to ask :)

  • Special characters in multi-byte simplesmetne encodings do not fit in a char type in "c" - simple as that. If you try to char = 'ã'; and your file is encoded in utf-8, for example, you must have an error in the compilation. What you need to keep in mind - and what the bigown hasn’t explained so far is that: it is wrong to think that a byte corresponds to a caractére . Leiam http://local.joelonsoftware.com/wiki/O_M%C3%Adnimo_absoluto_que_todos_programadores_de_software_precise,_Absolutely,Positivamente_de_Saber_Sobre_Unicode_e_Conjuntos_de_Caracteres(Apologies!)

  • @jsbueno that, as I had already answered in http://answall.com/a/137047/101.

Show 1 more comment

Browser other questions tagged

You are not signed in. Login or sign up in order to post.