What exactly is the "u" modifier for?

Question

Asked 10 years, 8 months ago

Viewed 79 times

4

What exactly does the modifier u in the regular expressions of preg_ in PHP?

It is recommended to use it whenever processing strings that have accentuated characters?

$valor = 'ãẽi ouã';
preg_match('/\w+/u', $valor, $matches);

$matches; // 'array(2) { ãẽi, ouã}

1 answer

Browser other questions tagged php regex utf-8

You are not signed in. Login or sign up in order to post.

by Ricardo • **14,521** points · Answer 1 · 2015-07-23T19:08:45+00:00

This modifier /u is for Unicode support.

For example if you want to make a regex with words in Japanese it is necessary to use it.

preg_match('/[\x{2460}-\x{2468}]/u', $str);

Where \x{hex} - is a char-code hexadecimal UTF-8.

Running the following regex:

$valor = 'ãẽi ouã';
preg_match('/\w+/u', $valor, $matches);

returns:

array (
  0 => 
  array (
    0 => 'ãẽi',
    1 => 'ouã',
  ),
)

Running the following regex (without the modifier):

$valor = 'ãẽi ouã';
preg_match('/\w+/', $valor, $matches);

returns:

array (
  0 => 
  array (
    0 => '�',
    1 => '��',
    2 => 'i',
    3 => 'ou�',
  ),
)

Should not be used to pick up accented vowels examples:

$valor = 'ãẽi ouã';
preg_match('/a/u', $valor, $matches);

returns:

array (
 0 => 
  array (
  ),
)

Testing site: Link

Documentation: Link