Clang has Character error Too large but Visual Studio builds normal

Asked

Viewed 446 times

5

I’m having a hard time understanding why the Clang presents the error message

Character Too large for enclosing Character literal type

when trying to execute the code:

char c = 'ç';

while Visual Studio 2015 compiles smoothly. I know that different compilers can and have their different implementations. And that ç is outside the ASCII table, that the numerical value must be greater than 127 so Clang informs that it is not possible to store within the type char. But I’d still like to know:

Why doesn’t Clang allow me to use 'ç' as a char while Visual Studio allows it? Is it something predefined in Visual Studio? Some language based option of my system?

Because Visual Studio returns the "correct" value in the functions of string, as strlen even passing strings with accents?

Example: strlen("opção"); Returns 5 in Visual Studio, I expected the return to be 7 as Clang returns.

  • 1

    I would venture that because this character does not belong to the ASCII standard it occupies more than 1 byte (while the type char only supports this size). See if there is any option to consider the source code as UTF-8 when running.

  • Yes, I believe it is that in the Clang, there the strlen returns the value in bytes, or it is more than 1 byte for ç. From what I saw gcc allows changing the encoding, but the Clang does not have such option. But what intrigues me is that Visual Studio simply works without any kind of warning for characters outside the ascii table. Even using extended ascii.

1 answer

6


First: compiling or working is different from being right. Every programmer should have this very clearly. You were right to ask here to understand why it works or not.

Fiat 147 todo detonado andando pelas ruas

I don’t have any official information (I found some loose information that indicates this), but I can infer that Microsoft’s C compiler, which happens to be used by default in Visual Studio, is using a single byte character encoding, possibly Windows 1252. This uses an extended ASCII table (which only has 127 characters) allowing some accented characters. Will not work with characters outside this small 255 character table.

Recently the new compiler made it possible to better control how to treat this.

It’s clear to me that Clang uses UTF-8 by default (I’ve even read in some unofficial places that that’s right), which is a multi-byte encoding. When using characters beyond the ASCII table it needs to be represented by 2 or more bytes.

That explains why the char does not work, after all the specification of C clearly says that this type must have 1 byte always.

The function strlen() returns the correct value, after all it proposes to return the number of bytes and not characters.

I still recommend searching the documentation to confirm if these are standard encodings even.

Use wchar_t to secure a multi-byte type. Or wstring in C++.

Read about the functions of string multi-byte of the C.

  • I did some tests and it really seems to be the case, there is a lack of documentation on this discrepancy. But just trying to use a character from another language Visualc gave me the message: warning C4566: character represented by universal-character-name '\u30C8' cannot be represented in the current code page (1252), something else I realized that if the file is with the 1252 encoding the visual studio replaces the characters outside of that code page for a ?(char code 63) instead of causing error.. what makes the code compile, but does not work correctly.

  • 1

    Or Unicode literals instead of wide chars: http://en.cppreference.com/w/cpp/language/string_literal

  • @pepper_chico in C++, yes, of course.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.