Regular expression to find word exception

Asked

Viewed 851 times

3

I’m looking for a certain recursive function which I can’t remember the name of. For this I set up the following ER:

function (\w+)\([\x00-\xFF]+\1\(

That will necessarily search for all functions that call their own name (recursive). I am aware that the excerpt [\x00-\xFF]+ can bring me unexpected results like:

function rl(){
    // code
}

function teste(){
    rl();
}

However, this is irrelevant.

My problem is to deny certain names like index, busca, edit, to minimize my results.

Currently my search finds 900 results, in which, I believe, some 70% of them refer to these functions.

Failed attempts:

function ([^(index|busca|edit)])\(.*\)\{[\x00-\xFF]+\1\(
function ((?<!index)\w+)\(.*\)\{[\x00-\xFF]+\1\(
  • What you are trying to do is not only done with traditional regular expressions as they are defined. However javascript uses extended regular expressions with Lookahead, lookbehind and backreferences and can pick up some things that are not actually regular languages. I’ll take a look at this to see if I can help you, the path seems to be backreferences.

  • If I understand correctly you have this question that can help you http://answall.com/questions/26144/2-express%C3%B5es-regular-in-1

  • jsantos1991. Thank you for the reference. with it I found the solution.

3 answers

1

I managed with a somewhat complicated javascript function:

function localizaRecursoes(codigo) {
    var regex = new RegExp("function[\\s]+([a-zA-Z][a-zA-Z0-9_]*)[\\s]*\\(.*\\)[\\s]*\{[\x01-\xFF]*\\1\\(", "g");
    var resultado = [];
    var match = null;
    do {
        match = regex.exec(codigo);
        if (match != null && match[1].indexOf("index") == -1 && match[1].indexOf("busca") == -1 && match[1].indexOf("edit") == -1) {
            resultado.push(match[1]);
        }
    } while (match != null);
    return resultado;
}

To test her:

localizaRecursoes("function foo() { foo(); } function xoom() { xoom(); } function foq() { hghf(); } function ga() { ga(); } function buscaX() { buscaX(); } function yy() { yy(); } function feq() { hghf(); } function fre() { ghghgh fre(); dfsfdsf }");

Upshot:

["foo", "xoom", "ga", "yy", "fre"]
  • Victor. I appreciate the attempt, but unfortunately using this method I would have to force a variable to damage the files in the entire ERP. passing the code. My idea is simpler just using Ctrl+h)

1


With the help of the reference provided by jsantos1991. And the test site http://regex101.com/

Using ER on ER. I arrived at the result:

(?!function (index|edit|busca))(function (\w+)\(.*\)[\x00-\xFF]+\3\()

In the first part:

(?!function (index|edit|busca))

A search is performed for anything other than "Function index" or "Function Edit" or "Function search". In which we already have our first group: (index|edit|busca) nosso \1

The second group is the ER itself: (function (\w+)\(.*\)[\x00-\xFF]+\3\() nosso \2

In the second ER:

(function (\w+)\(.*\)[\x00-\xFF]+\3\()

we have the third group: (\w+) nosso \3

It is sought, as stated in the question, functions that make references to themselves.

In conclusion the second ER searches for the functions and the first says which not to capture.

0

Well, you didn’t choose a specific language so I’ll be doing it with PHP because I have greater familiarity but the ER itself I believe is functional in other languages as long as they support lookaround assertions and, if necessary, receive appropriate language-specific adjustments:

/function ((?!edit|busca|edit)\w+)\((.*?)\)\{[\x00-\xFF]+\\1\(\\2\)[\x00-\xFF]+\}/

Is married a word (\w+) other than one of the prohibited ((?!(palavra|palavra|palavra))).

Then the parentheses are married with anything inside for a possible list of arguments. You can remove if you don’t need them.

Then the bounding keys of a code block are married and within them any character ([\x00-\xFF]+), followed by our previously married function (\\1), the parentheses and their content (also removable) and anything new, so the function can appear anywhere in the code block.

The tests:

$str1 = 'function rl($a){
    rl($a)
}';

$str2 = 'function rl($a){
    //code
}';

$str3 = 'function index($a){
    anotherfunction($a)
}';

$str4 = 'function edit($a){
    edit($a)
}';

$str5 = 'function rl(){
    edit();
}';

preg_match( '/function ((?!edit|busca|edit)\w+)\((.*?)\)\{[\x00-\xFF]+\\1\(\\2\)[\x00-\xFF]+\}/', $str1, $m1 );
preg_match( '/function ((?!edit|busca|edit)\w+)\((.*?)\)\{[\x00-\xFF]+\\1\(\\2\)[\x00-\xFF]+\}/', $str2, $m2 );
preg_match( '/function ((?!edit|busca|edit)\w+)\((.*?)\)\{[\x00-\xFF]+\\1\(\\2\)[\x00-\xFF]+\}/', $str3, $m3 );
preg_match( '/function ((?!edit|busca|edit)\w+)\((.*?)\)\{[\x00-\xFF]+\\1\(\\2\)[\x00-\xFF]+\}/', $str4, $m4 );
preg_match( '/function ((?!edit|busca|edit)\w+)\((.*?)\)\{[\x00-\xFF]+\\1\(\\2\)[\x00-\xFF]+\}/', $str5, $m5 );

var_dump( $m1, $m2, $m3, $m4, $m5 );

Just the first house something, because:

  • In the second the function is not recursively called
  • On the third we have a forbidden name
  • In the room we have a forbidden name and a recursion of a forbidden name
  • In the fifth we have a valid name, but without having the function called recursively

Browser other questions tagged

You are not signed in. Login or sign up in order to post.