The end of a capture in a regular expression

Asked

Viewed 96 times

3

I don’t know which title could be better than that, because I don’t really understand technical terms related to regular expression.

But let me describe my problem. I have a code, where I get a regular expression to capture a certain expression and then turn it into a valid php code.

For example, the following string:

%[ $variable = 1 ]

generates the following code:

<?php $variable = 1; ?>

I can make it work. If you put expressions like this one:

 %[ foreach ($array as $key => $value) ]
 %[ endforeach ]

Only I’m having problems when the captured expression is on the same line.

An example:

[% echo "Esse é o Wallace"] [% "esse é meu nome" ]

The output generates this:

 <?php echo "Esse é o Wallace" ][% "esse é meu nome ?>

The regular expression I use to do this is generated by a class that mounts it as follows (based on sprintf, to improve visualization)

$exp1 = '%[';

$exp2 = ']';

$regexp = sprintf('/%s\s*(.*)\s*%s/', preg_quote($exp1), preg_quote($exp2));

preg_replace($regexp, '<?php $1 ?>', $meu_codigo_aqui);

That would be the expression:

 '/%\[\s*(.*)\s*\]/'

It is understandable that in the case where there was the unexpected result the expression recognizes only the last ] as the end.

But what I want is for that same regular expression to return me the data as follows:

 %[ echo "Expressão 1"] %[echo "Expressão 2"]

 <?php echo "Expressão 1" ?> <?php "Expressão 2" ?>

How can I do that?

I wish that when my expression ends with ] and have another beginning with %[, these two groups do not mix, but each is interpreted separately.

  • Warning: the variable that receives the sprintf() is different from what was passed to preg_replace() lacked a e :P

1 answer

5


It’s simple and you you’ve asked a question about it :

Change to :

 '/%\[\s*(.*?)\s*\]/'

The Non-greedy Operator.

Addendum

Caution when using the operated \s, many people use it thinking it means only the character ' '(space). You may end up capturing more things than you want.

You had also commented on the modifier s for the regular expression to consider line by line. However this is not correct. The line-by-line modifier is the m.

  • s = simple line. It will consider only the first line, and the others will be ignored.
  • m = multi line. This yes will consider each new line as a new sentence.

Regarding REGEX in PHP try to change the start/end modifier /, because if it is necessary to capture a / literal will be necessary to escape it.

  • /http:\/\//, it was necessary to escape the /, because PHP could have considered the end of regex. which would generate error.
  • ~http://~, it was not necessary to escape / because the start/end modifier is the ~.

Usually I use the ~, because it is not tried to capture it, but it is worth remembering that this can be any character not alphanumeric [^a-zA-Z0-9].

Browser other questions tagged

You are not signed in. Login or sign up in order to post.