How to Manipulate UTF-8 on the Moon

Asked

Viewed 823 times

3

How to work with a string composed of UTF-8 encoding on the Moon?

For example:

  • get a character code in a string by its index;
  • encode character codes, something like string.char(...códigos) or ('').char.

What are the possible ways?

2 answers

6


Being version 5.3 (although I do not understand much) can use the module utf8 and shall have the functions:


There is also a module starwing/luautf8 which makes it possible to have some extra functions (the author claims to have tested with Lua 5.2.3, Lua 5.3.0 and Luajit).

To install use the command (if you have luarocks):

luarocks install luautf8

And call it that in your script to avoid conflict with native functions:

local utf8 = require 'lua-utf8'

If you don’t have luarocks you can try to manually compile this file https://github.com/starwing/luautf8/blob/master/lutf8lib.c.

Some functions are utf8.byte, utf8.char, utf8.find, utf8.gmatch, utf8.gsub, utf8.len, utf8.lower, utf8.match, utf8.reverse, utf8.sub and utf8.upper.

4

To add, in version 5.3 there is a special syntax to encode the code of a character (almost equal to utf8.char):

local chr = '\u{código}';

code: a hexadecimal code.

The difference is that the syntax only encodes one character at a time and only works with strings where the character represents escape.

And here comes another library utf8 on Github that does not need to be compiled or installed natively.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.