1
I have a 32-bit integer representing a Unicode character and would like to convert this single character into its representation utf-16, that is, one or more 16-bit integers.
1
I have a 32-bit integer representing a Unicode character and would like to convert this single character into its representation utf-16, that is, one or more 16-bit integers.
3
The 16-bit (UTF-16) Unicode transformation format is defined in section 2.5 of the Unicode standard, as well as in the RFC 2781. It works like this:
U
the value you want to encode. If U
is less than 65,536, normally emits.U
is greater than or equal to 65,536, get U' = U - 65536
. That one U'
, by the rules of Unicode, will have the 12 most significant bits equal to zero (since the last Codepoint valid is 0x10FFFF
).1101 10
and the ten least significant bits equal to the ten most significant bits of U'
. 1101 11
and the ten least significant bits equal to the ten least significant bits of U'
.In C:
void
utf_16(uint32_t codepoint, FILE * out) {
uint32_t U;
uint16_t W;
assert(codepoint <= 0x10FFFF);
if (codepoint < 0x10000) {
W = (uint16_t) codepoint;
fwrite(W, sizeof(W), 1, out);
} else {
U = codepoint - 0x10000;
W = 0xD800 | (U >> 10);
fwrite(W, sizeof(W), 1, out);
W = 0xDC00 | (U & 0x3FF);
fwrite(W, sizeof(W), 1, out);
}
}
Browser other questions tagged unicode
You are not signed in. Login or sign up in order to post.
that’s the answer ! it worked perfectly
– Rodrigo Santiago