Pick up only the final stretch after the regular expression bar

Asked

Viewed 933 times

2

I did a general search here and on Google and I came to find something that worked partially.

I have the following text pattern:

030.000.208   OUTRA COISA
001.005.001   ALGUMA / COISA AQUI COM - TAMBÉM / GERAL.ASSINADA-COISA
040.001.218   OUTRA COISA / AQUI OUTRA/COISA GERAL
000.000.003   ALGUMA / COISA AQUI COM - TAMBÉM

In summary, the text has a pattern where it starts with a sequence of numbers, three whitespaces, a text (which may have scores), a blank, a bar and a blank, another text (which may have scores), etc.

What I need is for the text that comes at the end to be selected ONLY, after the last occurrence of space-bar-space.

Following the example above, the return for each line would be:

OUTRA COISA
GERAL.ASSINADA-COISA
AQUI OUTRA/COISA GERAL
COISA AQUI COM - TAMBÉM

I am using the following regular expression: \/ ([^\/]*)$

She’s returning this:

GERAL.ASSINADA-COISA
COISA AQUI COM - TAMBÉM

I’m getting a lot and I can’t find a solution, someone can give a light?

I will use regular expression in Google Sheets.

  • 1

    Well, your first line does not have the character / therefore does not meet the regular expression, and I did not understand why in your third line you expect "OTHER/GENERAL THING" and not "HERE OTHER/GENERAL THING".

  • I corrected the expected result. was typo error.

2 answers

1


You said you’re using Google Sheets, but you didn’t say if all the text is in the same cell. In the solution below, I am assuming that each row is in a different cell (this makes the solution simpler, including).


An alternative is to use this regex:

 ^\d{3}\.\d{3}\.\d{3}\s{3}(?:.*/ )?(.+)$

The markers ^ and $ are, respectively, the beginning and end of the string. So I guarantee that I will only treat the strings that follow rigidly this format.

Then we have \d{3}, meaning "3 digits" (the shortcut \d corresponds to a digit from 0 to 9, and quantifier {3} means "3 occurrences"), followed by \. (the dot character). Since the dot by default has special meaning in regex (corresponds to any character - except line breaks), he needs to be escaped with \ to be interpreted as a common character.

Then we have 3 more digits, another dot and 3 more digits, followed by \s{3} (3 spaces).

After that we have (?:.*/ )?. The parentheses with (?: form a catch group (and later we will understand why to use it). The whole group is (?:.*/ ), that is .* (zero or more characters), followed by bar, followed by space (note that there is a space before the )). And the ? soon after makes this whole stretch optional.

That is, I may have several characters followed by bar + space, or not. This serves both for the first case (which has no bar), as for the others. And I also take advantage of the fact of the quantifier * was greedy and try to grab as many characters as possible. This means that the expression ends up picking up the last bar that is followed by space (which is exactly what we want).

Next we have (.+), that is .+ (one or more characters), within parentheses, which forms a catch group.

The full expression I used on Google Sheets is:

=REGEXREPLACE(A1, "^\d{3}\.\d{3}\.\d{3}\s{3}(?:.*/ )?(.+)$", "$1")

The function REGEXREPLACE takes the value of a cell (in the example above it is "A1"), applies the regex and replaces it. In the third parameter, which indicates the substitution to be made, I used the special variable $1, which corresponds to the first catch group (in this case, the (.+)).

That’s why I used a no-catch group in the previous section, because then regex does not create a random group (since this is not used in the substitution).

The result is:

String original                                                       | Após REGEXREPLACE
----------------------------------------------------------------------|-------------------------
030.000.208   OUTRA COISA                                             | OUTRA COISA
001.005.001   ALGUMA / COISA AQUI COM - TAMBÉM / GERAL.ASSINADA-COISA | GERAL.ASSINADA-COISA
040.001.218   OUTRA COISA / AQUI OUTRA/COISA GERAL                    | AQUI OUTRA/COISA GERAL
000.000.003   ALGUMA / COISA AQUI COM - TAMBÉM                        | COISA AQUI COM - TAMBÉM

You can see the spreadsheet here, and the regex in operation here.


One detail is that the \s corresponds to several characters, such as space, TAB and line breaks (the exact list may vary according to the language/tool - in the case of Google Docs, he uses the engine RE2, whose syntax can be consulted here). If you want regex to consider only spaces, you can switch to:

=REGEXREPLACE(A1, "^\d{3}\.\d{3}\.\d{3} {3}(?:.*/ )?(.+)$", "$1")
                    tem um espaço aqui ^

Notice that now instead of \s, there is a blank before the {3}.

  • 1

    Perfect... worked exactly as needed... and the explanations of how the expression is being treated became very clear. It helped me a lot and I learned more about this world of regular expressions that I get a lot to understand... Valeu!!!!

0

Good afternoon @Udo Man, do you need to follow this rule exactly? "What I need is for the text that comes at the end to be selected ONLY, after the last occurrence of space-bar-space" if the answer is yes this regexp What Voce showed works " / ([^/]*)$ "

Your expected example is so:

030.000.208 SOMETHING ELSE

001.005.001 SOMETHING / SOMETHING HERE WITH - ALSO / GENERAL.SIGNED-THING

040.001.218 SOMETHING ELSE / HERE SOMETHING ELSE/GENERAL

000,000,003 SOMETHING / SOMETHING HERE WITH - ALSO

But for the expression to function as it wants the example should look like this:

030.000.208 / SOMETHING ELSE

001.005.001 SOMETHING / SOMETHING HERE WITH - ALSO / GENERAL.SIGNED-THING

040.001.218 SOMETHING ELSE / HERE SOMETHING ELSE / GENERAL

000,000,003 SOMETHING / SOMETHING HERE WITH - ALSO

You can check on this link : https://regex101.com/r/7dtEZK/1

  • 1

    Thank you for the answer.... but this is exactly the problem. I can’t modify the font, but I need the output displayed to be what I’ve exemplified.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.