Get all the content I don’t want via Regex

Asked

Viewed 319 times

-3

Guys, I need to pull from a string, by Regex, all content that is not equal to CNPJ.

Ex:

Flap1 -> 123 - EMPRESA CICLANO101 30.589.587/0001-87

Line2 -> 4567 - FULANO LTDA28.819.917/0001-31

Line3 -> 90 - ComPANHIA DEDE 77.282.198/0001-78

Cnpjs always stay at the end of the string, and the above examples happen. I’m using "Replace in String", from Pentaho.

  • The regex of the CNPJ would be: \d{2}\.\d{3}\.\d{3}\/0001-\d{2}

  • 1

    @Sam I think I’d better use \d{4} instead of "0001". All right, the vast majority of Cnpjs have 0001, but that’s not always the case. That number corresponds to the subsidiary, so a company with several subsidiaries could have 0002, 0003, etc. I have seen up to 0010, for example (the tenth subsidiary of the company)

  • Luiz, regex will only validate if there are the correct characters in the desired position and quantity, eg: digits, dot, hyphen, etc. But I would also validate the check digits out of from regex, just to make sure the CNPJ is valid (this is useful to avoid typos, for example)

  • 1

    @hkotsubo has a point.

2 answers

1


Since you are using Pentaho, and the CNPJ is always at the end of the String, you do not need to use complicated Regex that because of 1 character wrong may not work.

Use a step Formula with the following code:

RIGHT(TRIM([field]);18)

It is always good to use the TRIM() command in Strings, because if there is garbage of whitespace after the CNPJ, they will already be treated.

Pentaho has Steps for almost everything, use Regex, Javascript and Java only after exhausting all the possibilities of PDI native Steps.

0

You can use the following regex "query", I did a test here and it worked with your sample data:

.*(([A-Z])\w+)

+: takes one or more repetitions;

[A-Z]: characters from A to Z;

.: represents any character, in the case of its spaces;

*: zero or a repeat;

\w: any alpha-numeric character.

I ran the test on this link: https://regexr.com/45kp3

  • legal, however he considers the 28 of line 2 and ends up replacing also

Browser other questions tagged

You are not signed in. Login or sign up in order to post.