In the past, programming languages only supported ASCII encoding that defines 128 symbols. This encoding is excellent for English, producing very compact texts where each letter spends only one byte. With the growth of the internet and an increasingly globalized world, problems quickly began to arise, as the people of Brazil could not use accents in their words.
It was then that initiatives began to create an encoding that would bring together all the symbols used all over the world.
ASCII only defines 128 symbols, which makes the first bit of every byte zero in this encoding. The UTF-8 standard took advantage of this and defined the first 128 symbols exactly equal to ASCII. When a character that is not present in this pattern is required, UTF-8 sets the value of the first bit to 1 and defines codes that say whether the character will have 1, 2, 3 or 4 bytes. Therefore a program using UTF-8 will be fully compatible with any ASCII text.
The problem is that Mysql did not fully adhere to the UTF-8 standard. It implemented only symbols up to 3 bytes and forgot the rest. What is stated in Mysql as utf8 is not actually UTF-8, it is just a piece of it. To fix this error, starting with version 5.5, Mysql implemented the full standard from 1 to 4 bytes and as it had already used the name utf8 called its new implementation utf8mb4. Summing up Mysql utf8 is not UTF-8 and utf8mb4 fully follows UTF-8 standard.
Still, the utf8 and utf8mb4 have a great compatibility, most absolute characters will be equal in both systems. If you switch from one to the other you probably won’t see the difference. Unless, of course, Chinese people start using animals as letters, then they will be upset when it appears #û&ý in place of kittens. Even if you use all the existing accents it would be no problem!
The point is, the Mysql standard is the Latin1 encoding, also known as ISO 8859-1 that defines all Latin language characters and can be very well used in Portuguese. When you stopped declaring UTF-8mb4, Mysql used this encoding and as your application is probably in UTF-8 these patterns do not represent the accents in the same way, but represent ASCII in the same way, so the error appears only in accents.
Maybe this part of the script went wrong because the version of Mysql used does not support utf8mb4. If this is the case only use the utf8 in place, the accents will be compatible.
utf8mb4 allows an extra byte in the encoding, basically this. For use in current languages, utf8mb4 is the same as the 3 byte version. Probably your problem is elsewhere in the code.
– Bacco
but in those lines where are the options CHARSET and COLLATE, it is only for that that they serve, allow an extra byte in coding?
– DiChrist
basically changes nothing anywhere but taking up more space in DB when you set something in CHAR. CHAR(10) reserves 30 bytes in utf8, CHAR(10) reserves 40 in utf8mb4, and CHAR(10) reserves 10 bytes in Latin. BMP characters, which are those supported by utf8 are identical to utf8mb4.
– Bacco
Oh I get it, so that shouldn’t be the cause of the problem here, send your comment as an answer for me to accept
– DiChrist
I do not promise, but if I take some more technical references, then put as an answer. I just wanted to move the subject forward so you have a basic notion. I think the answer, It is missing by good sources for staff consult (I think answers of this type deserve a more detailed explanation, so if you give I elaborate better later).
– Bacco
Beauty, in the waiting.
– DiChrist
Obviously, if someone wants to post a detailed answer, and talking things through, feel free (if it is to explain better, otherwise I recommend leaving it as a comment as well. If to talk nonsense, the comment "saves" the person from negativation).
– Bacco