2
What is the difference between Unicode and UTF-8? They are the same encoding or one is derived from the other?
2
What is the difference between Unicode and UTF-8? They are the same encoding or one is derived from the other?
3
Unicode may refer:
UTF means Unicode or (UCS) Transformation Format, which is like characters Unicode are represented in the computer memory or transmitted.
To really understand the difference between Unicode and UTF-8 it is necessary to understand the following concepts.
An abstract character is a platonic ideal of a element fundamental text.
The platonic ideal refers, for example, to the concept that R is equal R or any other representation of the letter R.
Already the question of element fundamental text is more complicated, for example c can be understood as a single character or as the composition of two: c and ¸.
Part of the work of Unicode Consortium is in identifying these fundamental elements. See below examples of the relationship between (nonfundamental) text and characters.
Once the characters a are defined Unicode Consortium defines codepoints for these characters. These codepoints sane identifiers and DO NOT necessarily relate to how characters are stored in memory, they are an abstraction that allows to identify all characters mapped easily.
Those codepoints are also commonly referred to as Unicodes and are represented as U+0041 or \u0041 (Lyric To)
As it is possible to see below an abstract character can be represented by one or more characters Unicode
Refers to how the codepoints are stored in memory or transmitted. UTF-8 uses code Units 8-bit to store characters Unicode. Depending on the character UTF-8 uses 1 to 4 code Units, for example, To uses a code Unit, Ω two and ? four.
The image below shows how the different formats encode the same characters. In UTF-32 the Codepoint is equal to code Unit, the same does not apply to other formats and the conversion of Codepoint for code Units is not obvious.
Information and images taken from Unicode Standard Version 10.0 - Core Specification
Browser other questions tagged utf-8 unicode
You are not signed in. Login or sign up in order to post.
Unicode can be implemented with several character encodings: https://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings?wprov=sfsi1 UTF8 is one of these implementations and is ASCII-compatible
– Jefferson Quesado