What are the valid delimiters for preg_regular expressions?

Asked

Viewed 374 times

3

What are the valid delimiters for regular expressions in function preg_*?

I know we can use some of these (/, ~ and #), as shown in the example below:

$numero = '0.12.13';
preg_replace('/\D+/', '', $numero); // string('01213')
preg_replace('~\D+~', '', $numero); // string('01213')
preg_replace('#\D+#', '', $numero); // string('01213')

But I would like to know what are the other delimitators of regular expressions (from PREG) in PHP.

I can only use special characters (and never numbers or letters)?

If yes, what are these special characters?

2 answers

3


Of http://php.net/manual/en/regexp.reference.delimiters.php:

A delimiter can be any non-alphanumeric, non-backslash, non-whitespace Character.

(free translation)

A delimiter can be any non-alphanumeric character, not backslash, not blank.

The ones you used in the example are the most common. They are many options of valid delimiters (including those that have special function in regular expression): ., $, _, :, ?, ^, %, &

Demonstration: https://ideone.com/uhTzHe

  • 1

    Just adding, preferably do not use the reserved characters of the language, because if you need to use them will be necessary to skim them with the \\. I, for one, always use ~ because it is not reserved from language.

2

Only by complementing the another answer, if you use the characters {, (, [ or < at the beginning, the final delimiter cannot be the same character:

preg_replace('{\D+{', '', $numero); // erro: No ending matching delimiter
                  ^ aqui não pode usar o mesmo caractere ("{")

In this case, you must close the expression with the respective closing character (i.e., }, ), ] or >):

$numero = '0.12.13';
echo preg_replace('{\D+}', '', $numero); // 01213
echo preg_replace('(\D+)', '', $numero); // 01213
echo preg_replace('[\D+]', '', $numero); // 01213
echo preg_replace('<\D+>', '', $numero); // 01213

For the rest permitted characters, you always use the same at the beginning and end of the expression.


Just to point out that the use of other delimiters is useful to avoid too many leaks within the expression. For example, if I use bars as delimiters, inside the regex the bars should be escaped:

$numero = '0/1/2.1/3';
// eliminar somente as barras
echo preg_replace('/\//', '', $numero); // 012.13

Notice that I had to write the bar as \/, so that it would be considered part of the regex and not be confused with the delimiter.

But I could avoid this escape by changing the delimiters:

echo preg_replace('#/#', '', $numero); // 012.13

In this case the gain was not so great like this, but if the expression had often the character /, This could lessen the "annoyance" of writing \/ all the time (besides leaving the regex - in my opinion - a little more readable).


Another point - which can be kind of confusing - is that if I use as a delimiter one of Brackets which have special significance in regex ({, [ or (), within the expression they need not be escaped:

// remover sequência de 3 ou mais números seguida de barra
echo preg_replace('{\d{3,}/}', '', '123/245.144/3'); // 245.3

// remover os dígitos 2 ou 4 e o outro dígito que tiver depois
echo preg_replace('[[2-4]\d]', '', '12.23-45/14'); // 12.-/14

// remover somente os dígitos 2 ou 4, desde que não tenha um dígito antes
echo preg_replace('((?<=\D)[2-4])', '', '12.23-45/14'); // 12.3-5/14

Note that even using {, [ or ( as delimiters, within the regex these characters can be used normally.

  • 1

    Really see how this about [], <>, {}, () can disrupt (confuse), works preg_replace('#[{]#', 'foo', '{ { {') already thus will fail preg_replace('{[{]}', 'foo', '{ { {') (No ending matching delimiter '}' found), which will end up requiring "escape" the { within the [...], thus '{[\{]}'. In a simple regex, it will be easy to solve, in a more complex will give more headaches.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.