Ignore any white space in the middle of a string

Asked

Viewed 1,526 times

0

I am trying to create a regex that is able to find a string in a text even if there is some white space in the middle of the words. For example, I look in the text for the following excerpts

"impaired conciliation" or "irreconcilable"

But as not everything is always beautiful there may be some space lost in the middle of words, for example:

"with preliminary ciliation" or "i n c on ciliates"

I did it this way:

padrao = re.search(r'i\s*n\s*c\s*o\s*n\s*c\s*i\s*l\s*i\s*a\s*d\s*o\s*s|'
                     r'c\s*o\s*n\s*c\s*i\s*l\s*i\s*a\s*ç\s*ã\s*o\s*(p\s*r\s*e\s*j\s*u\s*d\s*i\s*c\s*a\s*d\s*a|r\s*e\s*j\s*e\s*i\s*t\s*a\s*d\s*a)', text)

My question is.. Is there a less ugly and gigantic way to ignore these spaces?

2 answers

2


Or you can remove any space first and perform the search afterwards:

texto = re.sub("\s", "", texto)

You can then search for the text normally using its regular expression. Depending on your goals, you might want to put everything into capture groups:

resultado = re.search("(conciliação)(prejudicada)|(inconciliados)", texto)

And if you want all the results, you can use re.findall:

resultados = re.findall("(conciliação)(prejudicada)|(inconciliados)", texto)
  • Thank you very much! solved a big problem in the simplest way.

0

A less ugly way is to let a function generate regex for you:

def to_regex(s):
    return '\s*'.join(s)

print(to_regex('teste'))

Browser other questions tagged

You are not signed in. Login or sign up in order to post.