PHP regex recursive

Asked

Viewed 156 times

3

I have to validate the syntax of expressions like these:

(ind(10)+15)-10
1000-(perg(25)*2)
25/var(1)
12*2-(58+1)/5

Ind, Perg and var are functions.

I’m trying to solve it this way:

$var = '(\((var[\(])[0-9]+\)\)|(var[\(])[0-9]+\))';
$perg = '(\((perg[\(])[0-9]+\)\)|(perg[\(])[0-9]+\))';
$ind = '(\((ind[\(])[0-9]+\)\)|(ind[\(])[0-9]+\))';
$number = '[0-9]+(\.|,){0,1}[0-9]{0,2}';
$ope = '(\+|\-|\*|\/)';


$x = '('.$ind.'|'.$var.'|'.$perg.'|'.$number.')';
$a = $ope;
$s = '/^('.$x.$a.'(?R)|\('.$x.$a.'(?R)\)|(?R)'.$a.$x.'|\((?R)'.$a.$x.'\)|'.$x.')$/';

print_r(preg_match($s, '(perg(6)*perg(4)*)*1000000'));

And it’s generating that mistake:

Warning: preg_match(): Compilation failed: recursive call could loop indefinitely at offset 359

I’ll try to explain how I tried to solve this problem

Pensei na segunite expressão regular: 'X A S' or 'S A X'

X-> são os valores;    
A-> um operador matemático;    
S-> um expressão.    

A-> + ou - ou * ou /;    
X-> ind(número) ou perg(número) ou ind(número) ou número float;    
S-> X A S ou (X A S) ou S A X ou (S A X) ou X    
  • What you want cannot be produced by REGEX, because it is more complex, since it is a Lexico analyzer. Just the fact that you have to balance () already makes it complex to be done via regex. You will need to develop a Lexical analyzer.

  • Thanks Guilherme for the help. Could you point me to some material so I can solve this problem? If possible in PHP.

  • Writing a simple lexer in PHP, This material is very good to give a basis, can search for "PHP parser" or "PHP Lexer", are two terms involved.

  • Thank you William, I will study on the subject.

  • @Guilhermelautert This is very good advice. However, lexers can also use regex. For such a simple validation, I don’t think it can’t be produced by regex.

1 answer

1


Let’s simplify:

$oper   = '[-+*\/]';           //um operador matemático
$numero = '(?:- ?)?\d*\.?\d+'; //float
$nomes  = '(?:var|perg|ind)';  //nomes de função

$recurse= '(?1)'; //Vou explicar depois

Since the parentheses allow any mathematical expression inside, we can define:

$parens = '\(' . $recurse . '\)';    //=>  \((?1)\)

And functions also have the same syntax, so we can put function names as optional before a set of parentheses (think of parentheses as an unnamed function):

$funcao = "$nomes?$parens";          //=>  (?:var|perg|ind)?\((?1)\)

So we have the values:

$x = "(?:$numero|$funcao)";          //=>  (?:(?:- ?)?\d*\.?\d+|(?:var|perg|ind)?\((?1)\))

Which can be followed any number of times by a mathematical operator with another value:

$s = $x . "(?:$oper$recurse)*";      //=> (?:(?:- ?)?\d*\.?\d+|(?:var|perg|ind)?\((?1)\))(?:[-+*\/](?1))*

And we put everything in parentheses so that it is this same recurse:

$regex = "/^($s)$/";


Regular expression

/^((?:(?:- ?)?\d*\.?\d+|(?:var|perg|ind)?\((?1)\))(?:[-+*\/](?1))*)$/

Regex101


Why (?1)

(?R) house recursively to the entire pattern, but this can never match ^ in the middle of the text. Instead, (?1) allows recursively match the group 1 (which in this case includes the whole expression except the anchors). See Recursive Patterns.

Graphically:

Setas marcam de onde e para onde a chamada recursiva é feita



Code

function validar($expr) {
    $oper   = '[-+*\/]';           //um operador matemático
    $numero = '(?:- ?)?\d*\.?\d+'; //float
    $nomes  = '(?:var|perg|ind)';  //nomes de função

    $recurse= '(?1)';

    $parens = '\(' . $recurse . '\)';
    $funcao = "$nomes?$parens";

    $x = "(?:$numero|$funcao)";
    $s = $x . "(?:$oper$recurse)*";

    $regex = "/^($s)$/";
         //=> /^((?:(?:- ?)?\d*\.?\d+|(?:var|perg|ind)?\((?1)\))(?:[-+*\/](?1))*)$/

    return preg_match( $regex, $expr);
}


//  EXEMPLOS
$exemplos = [
        '9-8+',
        '(ind(10)+15)-10',
        '(ind(10)+15-10',
        '1000-(perg(25)*2)',
        '1000-perg(25)*2)',
        '1000-perg(25)*2',
        '25/var(1)',
        '12*2-(58+1)/5',
        '(perg(6)*perg(4)*)*1000000',
        '0'
    ];

foreach ($exemplos as $expressao) {
    echo (validar($expressao) ? '✔️' : '✖️') . " $expressao\n";
}

Upshot

✖️ 9-8+
✔️ (ind(10)+15)-10
✖️ (ind(10)+15-10
✔️ 1000-(perg(25)*2)
✖️ 1000-perg(25)*2)
✔️ 1000-perg(25)*2
✔️ 25/var(1)
✔️ 12*2-(58+1)/5
✖️ (perg(6)*perg(4)*)*1000000
✔️ 0

See working on http://ideone.com/nHl0Mo

Browser other questions tagged

You are not signed in. Login or sign up in order to post.