What is "accentuation"
To graphic accent is the application of certain symbols written on certain letters to represent what has been stipulated by the language accentuation rules. Among these signs are the various graphic accents, in addition to the rest of the diacritics, such as the trema, for example.
The representation of accented letters on the computer requires support for a representation that goes beyond the simple pattern US-ASCII. Historically this was done through extensions to this standard (ASCII Extended), made possible by the fact that ASCII uses only 7 bits of a byte, leaving 128 unused codes that could be used to represent different characters. Currently, the standard follows Unicode to represent text in several languages and alphabets, and the conversion of this text to the binary format and vice versa is done through mapping codes (also called coding or encoding).
The most common way to work with accented letters is through characters premixed, that is, where the letter and the diacritic sign are part of the same character. But the same character can also be represented by one (or more) diacritical sign that combines with a common letter (i.e. two or more code points are used to represent a single character). This brings additional complexity in the handling of such letters, and often a normalization is required.
Tag use
Use this tag in questions about the correct handling of accented characters in a computer program, including but not limited to:
- Text/binary representation and conversion;
- Do not use that tag if your question is about text Unicode in general, only in the specific case of accented letters.
- Capitalization of accented letters (e.g., conversion between upper and lower case);
- Ordering strings containing accented letters, according to your language and/or locale patterns (collation).