How to make a regular expression that finds a name and then looks for a character?

Question

How to make a regular expression that finds a name and then looks for a character?

Asked 7 years, 11 months ago

Viewed 6,439 times

1

I was analyzing an extensive html code that basically contains this format:

<span id="mensagem" class="topo">Classes e comandos</span>

The problem is that the amount of arguments within span varies in quantity and position

The goal is to get the set "Classes and commands".

For that, I need that when the search finds the sequence "message", look for the next character ">" and when you find it, take the string of characters in front that are different from the character "<".

Thus:

           (achou)--------------v(achou) 
<span id="mensagem" class="topo">Classes e comandos</span> 
                                 ||||||||||||||||||x(chega nesse e para)
                                   (pega esses)

Just need to express this in regular expression. I am using Notepad++, someone would know to formulate a regular expression for this problem?

Which programming language you are using ?

– NoobSaibot

2017/08/23 at 01:46
I am using Notepad++, more specific: Find, Find with the "regular expression" option selected.

– PTC man

2017/08/23 at 02:00
Try the following: <span[^>]+>(.*?)<\/span>

– NoobSaibot

2017/08/23 at 02:07
See working here

– NoobSaibot

2017/08/23 at 02:09
He is very close to this friend Wéllington. In the site regex101 he separates the text in Group 1, but in Notepad ++ he.

– PTC man

2017/08/23 at 02:14
Recommended reading: https://stackoverflow.com/a/1732454/4438007

– Jefferson Quesado

2017/08/24 at 14:01

Show 1 more comment

2 answers

Browser other questions tagged regex notepad++

You are not signed in. Login or sign up in order to post.

by Paz • **3,062** points · Answer 1 · 2017-08-23T23:49:15+00:00

Answer
As mentioned by the Wellington user, you should follow the steps:

Go to Search-> Replace.
Set the value of the Search/Find field: (<.*?(?=mensagem).*?>)(.*?)(<.*?>)|(.*)
Set the value of the Replace field to: 2 or $2.
Set the search mode to: Regular expression.
Click the button: Replace everything.

This will replace the entire text with content that has the keyword message within the tag.

You can test this regex here.

If you have not solved your problem comment here what you expected, what happened wrong and try to solve, I hope to have helped :D

Explanation by Regex
This regex has 4 groups of catches, I will explain what each one does so that I can better understand

(<.*?(?=mensagem).*?>)

The group 1 will capture everything that is between the tag, if you have the message word in any position before the character ">", for that I used a Positive Lookahead, it determines that everything between (?= and ) is a condition for catching what is before.

(.*?)

The group 2 will only be triggered if group 1 captures something, since it is in the same expression and is not after an operator OR, it captures anything but line breaks and stops as soon as another character of the next expression is found.

(<.*?>)

The group 3 captures everything that is between the tags after group 2, the tag "<" also serves as a limiter for group 2 to stop capturing when they find it.

|(.*)

The group 4 is an expression that is after the operator OR, this means that if regex does not capture with the previous expression, it will try to capture with that, so just insert an operator "." to capture any character other than line break (\n), then anything that does not match your search will be deleted by replacing everything with group content 2.

by NoobSaibot • **9,554** points · Answer 2 · 2017-08-23T02:54:42+00:00

Follow the steps:

Go to Search-> Replace.
Set the value of the Search/Find field: <span[ >]+>(.*?)</span>
Set the value of the Replace field with: 1 or 2.
Set the search mode to: Regular expression.
Click the button: Replace everything.

Remembering that it will leave only the found result, example:

<div>
   <span id="mensagem" class="topo">Texto 01</span>
   <span id="mensagem" class="topo">Texto 02</span>
   <span id="mensagem" class="topo">Texto 03</span>
   <span id="mensagem" class="topo">Texto 04</span>
   <span id="mensagem" class="topo">Texto 05</span>
</div>

Just stay:

Texto 01
Texto 02
Texto 03
Texto 04
Texto 05