The PHP functions whose nomenclature starts with "mb_" belong to the functions Mbstring
MB stands for "Multibyte", that is, they are functions for manipulating multibyte strings.
Encodes like UTF8 are multibyte type (multiple bytes). In the official documentation, see the list of supported encodes: http://php.net/manual/en/mbstring.supported-encodings.php
Practical example
<?php
date_default_timezone_set('Asia/Tokyo');
ini_set('error_reporting', E_ALL);
error_reporting(E_ALL);
ini_set('log_errors',TRUE);
ini_set('html_errors',FALSE);
ini_set('display_errors',TRUE);
define( 'CHARSET', 'UTF-8' );
ini_set( 'default_charset', CHARSET );
if( PHP_VERSION < 5.6 ){
ini_set( 'mbstring.http_output', CHARSET );
ini_set( 'mbstring.internal_encoding', CHARSET );
}
header( 'Content-Type: text/html; charset=' . CHARSET );
/*
Retorna 6
Cada caracter "coração" está ocupando 3 bytes.
Caso queira contar a quantidade de bytes, strlen() é o mais indicado.
*/
echo strlen('I♥NY') . PHP_EOL . '<br />';
/*
Retorna 4
Caso queira contar a quantidade de caracteres, utilize a função equivalente em MBString
*/
echo mb_strlen('I♥NY');
/*
Note que mesmo os caracteres latinos são multibyte
*/
echo strlen('ação') . PHP_EOL . '<br />';
echo mb_strlen('ação');
?>
Another rarely used term to refer to multibyte characters is "variable-width encoding" (variable-width encoding).
https://en.wikipedia.org/wiki/Variable-width_encoding
Additional note
It is not always necessary to use mbstring functions. An example case, is when it is known that a given string has no multibyte characters.
Example:
echo strlen('123') . PHP_EOL . '<br />';
echo mb_strlen('123');
As the example shows, in this case it is unnecessary, however, we can deepen further with another numerical example.
echo strlen('123') . PHP_EOL . '<br />';
echo mb_strlen('123');
In this example, they are numbers, but multibyte.
There are many well-developed systems that "think" to be internationalized, but the vast majority do not make any test with the real world, as if the global term were to be summed up to the American and European continent.
More than 60% of the planet (Arabic, Greek, Russian, Indian, Asian) uses multibyte characters and each language has peculiarities such as this example of the multibyte numbers in the Japanese language table.
Therefore, we recommend the use of Mbstring functions if you want to build a system that offers greater compatibility with the various existing encodes.
Another important note: UTF8 is not an Encode compatible with all languages. And Mbstring functions are not limited to UTF8.
For example, Chinese characters are best supported by Big5.
There is also the use of UTF16 or UTF32.
However, even for Chinese characters, UTF8 is also used with some certainty, as it is "rare" that the Chinese themselves use all the ideograms. It’s over 60 grand.
pq prefix are cool :D haha. Important question +1.
– rray
Important and no one answered yet :\
– Wallace Maxters
A site can have several languages so the treatment is different, uft8 can serve for 1 language but not for another, for example Iso and what supports with more encoded letters, so that the such default_charset in php.ini does not work.
– João Reis