You said you’re using Google Sheets, but you didn’t say if all the text is in the same cell. In the solution below, I am assuming that each row is in a different cell (this makes the solution simpler, including).
An alternative is to use this regex:
^\d{3}\.\d{3}\.\d{3}\s{3}(?:.*/ )?(.+)$
The markers ^
and $
are, respectively, the beginning and end of the string. So I guarantee that I will only treat the strings that follow rigidly this format.
Then we have \d{3}
, meaning "3 digits" (the shortcut \d
corresponds to a digit from 0 to 9, and quantifier {3}
means "3 occurrences"), followed by \.
(the dot character). Since the dot by default has special meaning in regex (corresponds to any character - except line breaks), he needs to be escaped with \
to be interpreted as a common character.
Then we have 3 more digits, another dot and 3 more digits, followed by \s{3}
(3 spaces).
After that we have (?:.*/ )?
. The parentheses with (?:
form a catch group (and later we will understand why to use it). The whole group is (?:.*/ )
, that is .*
(zero or more characters), followed by bar, followed by space (note that there is a space before the )
). And the ?
soon after makes this whole stretch optional.
That is, I may have several characters followed by bar + space, or not. This serves both for the first case (which has no bar), as for the others. And I also take advantage of the fact of the quantifier *
was greedy and try to grab as many characters as possible. This means that the expression ends up picking up the last bar that is followed by space (which is exactly what we want).
Next we have (.+)
, that is .+
(one or more characters), within parentheses, which forms a catch group.
The full expression I used on Google Sheets is:
=REGEXREPLACE(A1, "^\d{3}\.\d{3}\.\d{3}\s{3}(?:.*/ )?(.+)$", "$1")
The function REGEXREPLACE
takes the value of a cell (in the example above it is "A1"), applies the regex and replaces it. In the third parameter, which indicates the substitution to be made, I used the special variable $1
, which corresponds to the first catch group (in this case, the (.+)
).
That’s why I used a no-catch group in the previous section, because then regex does not create a random group (since this is not used in the substitution).
The result is:
String original | Após REGEXREPLACE
----------------------------------------------------------------------|-------------------------
030.000.208 OUTRA COISA | OUTRA COISA
001.005.001 ALGUMA / COISA AQUI COM - TAMBÉM / GERAL.ASSINADA-COISA | GERAL.ASSINADA-COISA
040.001.218 OUTRA COISA / AQUI OUTRA/COISA GERAL | AQUI OUTRA/COISA GERAL
000.000.003 ALGUMA / COISA AQUI COM - TAMBÉM | COISA AQUI COM - TAMBÉM
You can see the spreadsheet here, and the regex in operation here.
One detail is that the \s
corresponds to several characters, such as space, TAB and line breaks (the exact list may vary according to the language/tool - in the case of Google Docs, he uses the engine RE2, whose syntax can be consulted here). If you want regex to consider only spaces, you can switch to:
=REGEXREPLACE(A1, "^\d{3}\.\d{3}\.\d{3} {3}(?:.*/ )?(.+)$", "$1")
tem um espaço aqui ^
Notice that now instead of \s
, there is a blank before the {3}
.
Well, your first line does not have the character / therefore does not meet the regular expression, and I did not understand why in your third line you expect "OTHER/GENERAL THING" and not "HERE OTHER/GENERAL THING".
– anonimo
I corrected the expected result. was typo error.
– Ludo Man