Let’s simplify:
$oper = '[-+*\/]'; //um operador matemático
$numero = '(?:- ?)?\d*\.?\d+'; //float
$nomes = '(?:var|perg|ind)'; //nomes de função
$recurse= '(?1)'; //Vou explicar depois
Since the parentheses allow any mathematical expression inside, we can define:
$parens = '\(' . $recurse . '\)'; //=> \((?1)\)
And functions also have the same syntax, so we can put function names as optional before a set of parentheses (think of parentheses as an unnamed function):
$funcao = "$nomes?$parens"; //=> (?:var|perg|ind)?\((?1)\)
So we have the values:
$x = "(?:$numero|$funcao)"; //=> (?:(?:- ?)?\d*\.?\d+|(?:var|perg|ind)?\((?1)\))
Which can be followed any number of times by a mathematical operator with another value:
$s = $x . "(?:$oper$recurse)*"; //=> (?:(?:- ?)?\d*\.?\d+|(?:var|perg|ind)?\((?1)\))(?:[-+*\/](?1))*
And we put everything in parentheses so that it is this same recurse:
$regex = "/^($s)$/";
Regular expression
/^((?:(?:- ?)?\d*\.?\d+|(?:var|perg|ind)?\((?1)\))(?:[-+*\/](?1))*)$/
Regex101
Why (?1)
(?R)
house recursively to the entire pattern, but this can never match ^
in the middle of the text. Instead, (?1)
allows recursively match the group 1 (which in this case includes the whole expression except the anchors). See Recursive Patterns.
Graphically:
Code
function validar($expr) {
$oper = '[-+*\/]'; //um operador matemático
$numero = '(?:- ?)?\d*\.?\d+'; //float
$nomes = '(?:var|perg|ind)'; //nomes de função
$recurse= '(?1)';
$parens = '\(' . $recurse . '\)';
$funcao = "$nomes?$parens";
$x = "(?:$numero|$funcao)";
$s = $x . "(?:$oper$recurse)*";
$regex = "/^($s)$/";
//=> /^((?:(?:- ?)?\d*\.?\d+|(?:var|perg|ind)?\((?1)\))(?:[-+*\/](?1))*)$/
return preg_match( $regex, $expr);
}
// EXEMPLOS
$exemplos = [
'9-8+',
'(ind(10)+15)-10',
'(ind(10)+15-10',
'1000-(perg(25)*2)',
'1000-perg(25)*2)',
'1000-perg(25)*2',
'25/var(1)',
'12*2-(58+1)/5',
'(perg(6)*perg(4)*)*1000000',
'0'
];
foreach ($exemplos as $expressao) {
echo (validar($expressao) ? '✔️' : '✖️') . " $expressao\n";
}
Upshot
✖️ 9-8+
✔️ (ind(10)+15)-10
✖️ (ind(10)+15-10
✔️ 1000-(perg(25)*2)
✖️ 1000-perg(25)*2)
✔️ 1000-perg(25)*2
✔️ 25/var(1)
✔️ 12*2-(58+1)/5
✖️ (perg(6)*perg(4)*)*1000000
✔️ 0
See working on http://ideone.com/nHl0Mo
What you want cannot be produced by REGEX, because it is more complex, since it is a Lexico analyzer. Just the fact that you have to balance
()
already makes it complex to be done via regex. You will need to develop a Lexical analyzer.– Guilherme Lautert
Thanks Guilherme for the help. Could you point me to some material so I can solve this problem? If possible in PHP.
– Leonardo Nori
Writing a simple lexer in PHP, This material is very good to give a basis, can search for "PHP parser" or "PHP Lexer", are two terms involved.
– Guilherme Lautert
Thank you William, I will study on the subject.
– Leonardo Nori
@Guilhermelautert This is very good advice. However, lexers can also use regex. For such a simple validation, I don’t think it can’t be produced by regex.
– Mariano