Capture groups where a specific word appears with Regex

Asked

Viewed 436 times

1

I have the following situation:

text_1 = O cachorro correu com o gato
text_2 = O carro passou e o cachorro foi atrás
text_3 = Sempre que chego em casa meu cachorro pula em mim
text_4 = Ele foi correndo atrás do sonho
text_5 = O cachorro latiu para o carteiro
text_6 = Quando seu dono ordenou, corra cachorro

I want to take groups with "cachorro, pul\w+, corre\w+ e foi", but in all groups the word dog is present.

I tried to:

re.search((?:\s(cachorro|corre\w+|foi|pul\w+)){2,},text_n)

What gives match on:

text_1 = cachorro correu
text_2 = cachorro foi
text_3 = cachorro pula 
text_4 = foi correndo
text_5 = None
text_6 = corra cachorro

My problem is with the match text_4, that result does not suit me.
What I want is to know if there is a way to match in groups using Regular Expressions where a certain word in the case dog, appear at least once.
Other variations of the word correr and pular may occur together with dog.

Obg to all.

  • You are a little confused about what your real need is. Regular expression should return any occurrence of cachorro followed by another word or just occurrences with corre, foi and pul*? For in the text you say one thing and in the code you seem to do another.

  • The example is just an illustration, I want to pick up groups, no matter the order, where the word dog is present. I’ve tried re.search('(?:cachorro|(?:\s(corre\w+|foi|pul\w+)){2})', text) what would match {O cachorro foi pular a cerca} and tb would be useful to me, but I would pick up yet another text {He foi correndo behind the dream} and would not serve me.

  • But this did not answer my question: should one search ONLY for the verbs run, go and jump or may there be the occurrence of others that were not listed? For example, "The dog barked at the postman" should be a result?

  • Sorry, I tried to edit and the time passed, but yes, groups with the words cachorro, pul\w+, corre\w+, foi no matter the order or quantity, but in which the word dog is present.

1 answer

1

Answer

If what you want is to identify the words that are preceded by "dog " can be used a Positive lookbehind.

((?<=cachorro )corre\w+|(?<=cachorro )foi|(?<=cachorro )pul\w+)

You can see how this regex works here.


Explanation:

((?<=cachorro )[...]

The above regex identifies the word "dog" (with space at the end), through a Positive lookbehind: that is to say that identifies the use of this string and begins the match

[...]corre\w+[...]

After this captures the following word if it was something with prefix runs, Pul or equal to was. Above is the example with run.
With this you can add the word "dog" before each match resulting in what you wanted.

What you did wrong
By involving the capture group with OR (|)that even you didn’t, you end up capturing all occurrences of the words cachorro, corre\w*, foi e pul\w* regardless of the words that precede them.

Addendum

As mentioned in the comments, if you want to use some other predecessor that is not a dog, you can use OR by copying the previous expression and changing the predecessor and the occurrences you want to capture after it.
Example:

((?<=cachorro )corre\w+|(?<=cachorro )foi|(?<=cachorro )pul\w+)|((?<=gato )corre\w+|(?<=gato)foi|(?<=gato)dorm\w+)

Here is an example of the above regex in operation

  • Ola Paz, thank you for your availability. In this specific case this regex meets me, the problem is that my texts are a little more complex, there will be some exceptions after the word dog. In this case what I really want is to know how to make combinations using the words cited and that the word dog is within the combinations. I don’t want combinations of the word cachorro with any other than Pul\w+, corre\w+ or foi. The Behind look is very useful, but limited,pq if I want to put gato I’ll have to do another.

  • @Mueladavc edited

Browser other questions tagged

You are not signed in. Login or sign up in order to post.