Regex catch value between two words

Asked

Viewed 312 times

1

I need to take the amount 62.715,90 between the words TOTAL and DEDUÇÕES. In the OCR is the line break according to the text below.

I’m using Regular Expressio Design software.

NOME: TESTE DE SILVA SAURO  
CPF: 785.981.970-84  
IMPOSTO SOBRE A RENDA - PESSOA FÍSICA  
DECLARAÇÃO DE AJUSTE ANUAL  
EXERCICIO 2018 ANO-CALENDÁRIO 2017  
TOTAL  
>62.715,90  

DEDUÇÕES

My expression: TOTAL\n\d{1,3}(?:\.\d{3})*,\d{2}\nDEDU.*?ES

2 answers

1


I don’t know if it was a problem copying and pasting, but after "TOTAL" has some spaces, and before the number has a >. If that’s it, just put \n will not help. An alternative is to use:

TOTAL\s+>\d{1,3}(?:\.\d{3})*,\d{2}\s+DEDU.*?ES

Instead of \n, I use the shortcut \s, that already includes spaces and line breaks. I also use the quantifier +, meaning "one or more occurrences". That is, you can have multiple line breaks and spaces after "TOTAL".

Then it has its own character >, and then the part corresponding to the numerical value.

I don’t know if the software you’re using supports capture groups. If you have, just put the passage corresponding to the numbers in parentheses:

TOTAL\s+>(\d{1,3}(?:\.\d{3})*,\d{2})\s+DEDU.*?ES

Thus, the value will be available in the first capture group (see for example in regex101.com, on the right side appears "Group 1" with only the value you need).


If you don’t have the > before the value, just remove it from the regex:

TOTAL\s+(\d{1,3}(?:\.\d{3})*,\d{2})\s+DEDU.*?ES
  • Using Regex software and did not give :( ... Software: Regular Expression Designer.

  • 1

    @user2254936 The text is exactly as it is on the question? (with spaces after "TOTAL" and > before the value? ) If you don’t have the >, simply remove it from regex, for example. I tested it here with the free version of Regular Expression Designer and it worked

  • has no space after TOTAL and has no > before value.

  • 1

    @user2254936 Then just take the > of regex: TOTAL\s+(\d{1,3}(?:\.\d{3})*,\d{2})\s+DEDU.*?ES (updated the response with this option)

  • really, taking out > worked .... was worth too !

0

Maybe there’s another way but I was able to think of the following expression: TOTAL[\n]+([0-9.,]+)[\n]+DEDUÇÕES:

const reg = /^TOTAL[\n]+([0-9.,]+)[\n]+DEDUÇÕES$/gm;
const input = `TOTAL

62.715,90

DEDUÇÕES`;

var match = reg.exec(input);
console.log(match.filter(a => /^[0-9,.]+$/.test(a)));

  • The problem of [0-9.,]+ is that it also considers valid strings as ... and ,,,, see

  • @hkotsubo is right, had not thought about it. Thank you very much for the warning vlw

Browser other questions tagged

You are not signed in. Login or sign up in order to post.