How to select all characters except some specific words using regex?

Asked

Viewed 5,383 times

7

Good morning to all,

I would like to run a string search for some specific strings, but in the project we are working on we have no access to any function find or similar, only to a regular expression-based substitution function (something like replace(string, regex, replacement)).

The idea then would be to select all the characters EXCEPT the sequences I want to find. Thus, I would remove these unwanted characters and compare with what I want to find.

Example (not a specific language):

string expReg = ??????;
string texto = "xxxxxxxxboloxxxxxxxfarinhaxxxxxxacucarxxxx";
string busca = replace(texto, expReg, "");
if(busca == "bolofarinhaacucar"){
    return("Sucesso");
}

Luckily the words we need to find need to be in the defined order, so it would not be necessary to include all permutations.

We try to find some solution using regular expressions, but we always bump into the problem that the positive lookbehind (?<=ABC) not supported in Javascript.

Any idea?

  • which language? Javascript?

  • 1

    Just out of curiosity: what is this language that does not have a find but has replace?

  • It’s actually a script used by a system in the company where I work. We also find it strange that he does not support find, but supports replace (even with regular expressions!), but does what, right...

2 answers

3

To remove a certain character from a sequence just make the following substitution (example in Javascript):

var texto = 'xxxxxxxxboloxxxxxxxfarinhaxxxxxxacucarxxxx';

var expReg = /([x]+)/g; // Vai procurar por uma ou mais ocorrências de "x"
var busca = texto.replace(expReg, '');

if(busca == "bolofarinhaacucar"){
    console.log("Sucesso");
}

DEMO

To deny the correspondence(match) of certain characters just use the character class denied using ^ at the beginning of class.

var texto = 'xxxxxxxxboloxxxxxxxfarinhaxxxxxxacucarxxxx';

var expReg = /([^x]+)/g;
var buscaArray = texto.match(expReg, '').toString(); // match retorna um array com os valores encontrados
var busca = buscaArray.split(",").join(""); // Transformamos para string para poder comparar

if(busca == "bolofarinhaacucar"){
    console.log("Sucesso");
}

DEMO

Maybe the syntax changes depending on the engine used. Here(in English) shows a comparison between the Engines of regular expressions.

  • If you have the x in the middle of the word will give problem.

  • @Rodrigorigotti I don’t think so, because if you have one x in farxinha is returned far and inha.

  • But that’s the problem. I put xxxx, but it could be any character, for example: "leitebolofarinhaovosacucar". And if you take out all the characters ([a-z]), would also affect the words we seek.

3


it would be good to know the language you are working to know a more efficient way to help you since denying word in regex is not easy

http://aurelio.net/regex/guia/negar-palavra.html#5_3

As you said it will always be those specific words can do something like this

expReg = "\w+(bolo)\w+(farinha)\w+(acucar)\w+"

string texto = "xxxxxxxxboloxxxxxxxfarinhaxxxxxxacucarxxxx";
string busca = replace(texto, expReg, "$1$2$3"); //substitui pelo grupo 1, 2 e 3

-- Can test here: https://regex101.com/r/oP3kU3/1

  • I really didn’t know there was this difficulty in denying words in regex. But this solution is closer to what we were looking for. There was only one detail I forgot to mention: there may be spaces between the words, so the \w+ would not work. But in this case just replace by \D+ or .* that works properly. Thank you!

  • 1

    if it worked ta blz =D

Browser other questions tagged

You are not signed in. Login or sign up in order to post.