0
I’m trying to create a regular expression to remove everything that is not part of the social reason in a string, but I’m having a hard time not removing the symbols that are in the middle of it.
Entree:
201700000000111 01/02/2017 11.111.111/0001-74 ADAMA BRASIL S/A ATIVA 0,00 160,00 160,00 0,00 0,00 0,00 0,00 0,00
201700000000122 01/02/2017 22.222.222/0002-75 AGRITEX COMERCIAL AGRÍCOLA LTDA (QUERÊNCIA) ATIVA 2,79 170,00 170,00 0,00 0,00 0,00 4,74 0,00
201700000000133 07/02/2017 33.333.333/0001-76 CREMONESE WANDSCHEER & CIA LTDA - ME ATIVA 0,00 50,00 50,00 0,00 0,00 0,00 0,00 0,00
201700000000144 23/02/2017 44.444.444/0001-77 G3 SEMENTES LTDA ATIVA 0,00 230,00 230,00 0,00 0,00 0,00 0,00 0,00
Required exit:
ADAMA BRASIL S/A ATIVA
AGRITEX COMERCIAL AGRÍCOLA LTDA (QUERÊNCIA) ATIVA
CREMONESE WANDSCHEER & CIA LTDA - ME ATIVA
Currently I created one of the form below, but it is not getting as I need. I’m using it in java, but you can post it in other ways.
s.replaceAll("[^A-zÀ-ú\\s]", "").trim();
Does the text always start from this fixed position? Or in the 4th token? It already makes the work easier.
– Murillo Goulart
You can change the rule of your regular expression, instead of seeking to remove what you do not want, you can do the search bringing only what you want type:
\b[A-zÀ-ú\s\\\/&\-\(|)]{2,}\b
see this example: http://rubular.com/r/4LdX3PR6s1– brow-joe
I edited the answer, thus arriving at this expression:
\b(\d{2}\.\d{3}\.\d{3}\/\d{4}\-\d{2})\b([A-zÀ-ú-1-9\s\\\/&\-\(|)]{5,}.*[a-zA-Z])\b
– brow-joe
It seems to me totally dispensable to regex. Apparently it would be enough to consider the spaces and dispense 3 items left and 8 right.
– Bacco