Coding in C++

Asked

Viewed 1,150 times

2

Hi, I would like to know how to change the output encoding of C++, knowing that the default encoding is ASCII.

I’ve tried with the libraries <windows.h> and <tchar.h>,in the following code:

int main(){
    _tsetlocale(LC_ALL, _T("portuguese"));
    (...)
}

But I realize that in addition to not working on other OS,.

I intend to use several characters, several languages in the application but I have no idea how to do.

UTF-8 is an option? If yes,?

  • 5

    C++ does not have an "output encoding". Your output will be a byte stream and what matters is that the code that will display that stream is prepared to read the encoding you used (of course, it depends on whether you will display on a terminal, a GUI in windows, etc).

1 answer

2


Distance yourself from the "set locale" functions (such as tsetlocale). They have nothing to do with your problem now.

And what our hugomg colleague said is right. The output of your program is bytes (like, are numbers). And how they will be shown on the screen of ANY operating system, depends on the execution environment of the program. To give an example, imagine a simple program, which just writes "Hi, world!" and ends. If we run this program on a terminal with different encoding than the BYTES of this string in the program have, the phrase will appear in a crazy way. If you’re right, it works. That’s all.

So, do some research on the following (Wikipedia is a teacher for this). 1) ASCII coding. 2) ISO-8859-1 coding. 3) UTF-8 coding. Remembering that UTF-8 no longer matches 1 byte per symbol, as in the others I mentioned.

My C programs use accents in the printfs strings. And this works well, as long as the strings in the source code have the same encoding as the terminal/screen where the program will run. Typically choosing the common encoding of the target system already ensures this. ISO-8859-1 and UTF-8 have been the best choices in 90% of cases. And cases apart are easy to change when you need them. Understand this paragraph?

Finally, there is a function and/or a library and/or a program called iconv. With this you can convert your strings from one encoding to another desired. It can also help.

In short:

  1. discover and choose the encoding that all strings in your source code have; it depends on your editor, it’s simple.

  2. Choose the encoding that your program will have in the output, and this will become a requirement for those who use it; the termnal/screen of the person will naturally understand the strings of your program. UTF-8 or ISO-8859-1 are great bets, in my opinion.

  3. Finally, you can use the iconv() function when something more "detailed".

Useful readings:

http://en.wikipedia.org/wiki/ISO_8859-1 (in English)

http://en.wikipedia.org/wiki/ISO/IEC_8859-1 (in Portuguese, só pro básico; artigo não está tão completo, ainda)

http://man7.org/linux/man-pages/man3/iconv.3.html (iconv function manual, in English)

http://www.gnu.org/software/libiconv/

http://en.wikipedia.org/wiki/Iconv

  • 1

    It is worth emphasizing that if a string is using a multibyte encoding some operations like counting the number of characters or extracting a substring become more complicated. If you have just passing the character buffer from one side to the other there is not much problem but if you want to manipulate it take care and use a library that does the right thing.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.