ASCII
American Standard Code for Information Interchange. As the name already says is a standard that suits Americans well. It goes from number 0 to 127, and the first 32 and the last are considered control, the others represent "printable characters", that is, recognized by humans. It is quite universal. It can be represented with only 7 bits, although normally a byte is used.
It is clear that it has no accents, that the Americans do not even use.
ANSI
There is no such encoding.
The term is American National Standards Institute, the equivalent of our ABNT.
As he established some standards of use of characters to meet various demands, many encodings (actually pages of code) end up being called generically ANSI, even to make a counterpoint to Unicode which is another entity with another type of encoding. Usually these code pages are considered extensions to ASCII, but nothing prevents some encoding specific is not 100% compatible.
Again it was an American solution to deal with international characters since ASCII did not serve well.
Depending on the context, and even the time, it means something different. Today the term is used for the Windows 1252 since much of Microsoft’s documentation refers to its encoding as ANSI. ISO 8859-1, also known as Latin1, is also widely used.
All encodings called ANSI that I know can be represented by 1 byte.
So it depends on what you’re talking about.
UTF
Alone doesn’t mean much. It’s Unicode Transformation Format. There are a few encodings who use this acronym. UTF-8, UTF-16 and UTF-32 are the encodings best known.
In Wikipedia articles there are several details. They are very complex and almost nobody knows how to use right in all its fullness, including me. Most implementations are wrong and/or do not meet the standard, especially UTF-8.
UTF-8 is ASCII-compatible (it accepts ASCII as valid characters). But not with any other character encoding system. It’s the most complete and complex encoding there is. Some are passionate about it (and this is the best term I’ve found) and others hate it, even though they recognize its usefulness. It is complex for the human (programmer) to understand and for the computer to handle.
The size of UTF-8 and UTF-16 is variable, the first of 1 to 4 bytes (depending on the version could go up to 6 bytes, but in practice it does not happen) and the second is 2 or 4 bytes. UTF-32 always has 4 bytes.
There’s a comparison between them. I don’t know how much it takes. It’s certainly not complete.
Unicode
It is a standard for representation of texts established by a consortium. Among the standards set by him are some encodings. But it actually refers to much more than that. It originated from the Universal Coded Character Set or UCS that was much simpler and solved almost everything I needed.
A article that everyone should read, even if you don’t agree with everything there is.
Supported character sets are separated into planes. One can have a overview of them in the Wikipedia article. Plane 0 or BMP being the most widely used, wildly.
All these standards are made official by ISO which is the international body regulating technical standards.
Has to do with UTF.
I would translate this from here, but I was too lazy to see where Google "missed". So I leave it to other answers.
– Randrade
As soon as I wrote the question I looked in Sozão (as usual) and came across this. If no one answers, I translate myself. I find it interesting to have this kind of content here, so I’ll keep the question at first =D
– Jéf Bueno
Your question is excellent to stay here. Only after I see Jon Skeet’s answers I get lazy to answer something like that. kkkk
– Randrade
I understand you, I understand you hahaha
– Jéf Bueno
Related: https://answall.com/q/394834/112052
– hkotsubo
read this here - https://medium.com/@Sestrem/o-m%C3%Adnimo-que-todo-desenvolvedor-saber-sobre-Unicode-e-character-sets-789a4229ecf5 is from 2003 by the stackoverflow host .
– jsbueno
@jsbueno in the middle of those 3 years and little I ended up reading (and rereading) it. Grateful for the tip.
– Jéf Bueno