Which encoding to use when it comes to accent?

Question

Which encoding to use when it comes to accent?

Asked 8 years ago

Viewed 4,907 times

0

There are so much encoding... I’ve searched a lot on Google and can’t differentiate when using each of these here

ISO-8859-1
UTF-8
LATIN1
cp...

See, I was having trouble with our accents, I tried UTF-8 right away because I always thought that if it was accents it would be UTF-8, but it didn’t work... After a lot of research I tried ISO-8859-1 and guess what worked... Why?

The ISO-8859-1 is a Latin alphabet encoding, but I know the ISO has discontinued it, but if it works for you keep using it. perhaps it has to do with your difficult to know ide with little information

– Rodrigo Prado

2017/12/11 at 23:33

2 answers

Browser other questions tagged java

You are not signed in. Login or sign up in order to post.

by Enzo Beltrami • 11 points · Answer 1 · 2017-12-12T00:54:50+00:00

In general the most used would be the UTF-8, since it supports all alphabet and accents plus a gigantic range of special characters, as for your accentuation problem there are several causes for the problem, it may vary from lack of configuration in some part of the code or wrong configuration in the text/IDE editor(most common cause).

UTF-8 is generally used for internationalization of a text (although commonly used nationally as well) while ISO-8859-1 is generally used for encoding national texts.

The reason for the difference is in the way the text is encoded ISO-8859-1 encodes text that is different from UTF-8, while UTF-8 supports all Unicod characters while ISO-8859-1 only supports the first 256 (if I’m not mistaken).

This answer in Stack can explain better than me: What are the main differences between Unicode, UTF, ASCII, ANSI?

by Guilherme Nascimento • **98,651** points · Answer 2 · 2017-12-12T00:58:57+00:00

The ISO-8859-1 is a Latin alphabet encoding, but I know the ISO has discontinued it, but if it works for you keep using it. perhaps it has to do with your difficult to know ide with little information

Before we begin, that just explain one thing about Rodrigo’s comment, many people think that too, is not that ISO-8859-1 has been "discontinued", simply it does not have much more to evolve, the UTF-8 has already supplied and sought to focus on the UTF-8 the necessary evolutions, ISO-8859-1 will only no longer evolve, does not mean it is not usable, saying it is discontinued sounds like something being "not recommended" or will stop "working", which can lead to false interpretations.

ISO-8859-1 can be used smoothly if your App will be in Portuguese and if you don’t need emojis or multi-languages.

ISO-8859-1 vs UTF-8

ISO-8859-1 is a single-byte (8-bit) encoded character set and consists of 191 characters

UTF-8 is a 1-4 byte encoding for each character, and currently supports many characters and is one of the most commonly used on websites

That is, they are two different encodings, for example the accented letter à (a with crase) has different codes in both encodings:

UTF-8 o à has the following code: c3 a0
iso-8859-1 the à has the following code: E0

(if the tables I searched for are correct)

Then note that in utf-8 was c3 a0 and in the iso: E0, or even though to us humans seems the same thing, it is not, in UTF-8 was used 2 different bytes, what did the "uni-code"

Summarizing both support accents, the point is that depending on the application, framework or the like, the project can be UTF-8 standard or use windows-1252 ("compatible" with iso-8859-1), if you are reading a text file and trying to display and in your application it is because the content of the text file is in a format compatible with ISO-8859-1 and so set in UTF-8 project will not work.

The same goes for a database, if the charset defined in the database table is latin1 (compatible with ISO-8859-1) then there is no way to use utf-8 in the project (although depending on the database it is possible to adjust and even get some result), then answering the question title:

What encoding to use when it comes to accent?

ISO-8859-1 as much as UTF-8 supports accents, it will depend on where you take the data, such as a database or a text file, if all database files or tables use UTF-8 then the project will have to be in UTF-8, if they are in ISO-8859-1 (database or text files) will need to use ISO-8859-1 in your project

Saving a file

Assuming you are using a text editor like Notepad++ (which is better than Windows native for this) before saving you can convert your text document (if you are) to ANSI (which will be compatible with iso-8859-1) or to UTF-8:

If you save the text document as UTF-8 and try to use iso-8859-1 in your Java project will conflict and characters will get lost at the time of displaying, the same goes for the opposite, if saving the document as ANSI and setting the project as UTF-8 will fail, ie always use the same encoding if possible, because otherwise you will have to use encoders and decoders inside the application.