Regex to get result from above line

Asked

Viewed 22 times

1

I have the OCR below and would like to get the result in regular expression.

Of: 1505 - ADMINISTRACAO DOS PENSIONISTAS DO IPREV, above the word CPF.

I thought of using the CPF as a parameter, because always this result is above the CPF.

ESTADO DE SANTA CATARINA
Contra-cheque individual
Administracao dos Pensionistas do IPREV
1505 - ADMINISTRACAO DOS PENSIONISTAS DO IPREV
CPF: 000.000.000-00
Matrícula: 0000000000
Inscrição: 00000000

I did the regex:

\r*?([A-Za-z0-9\s]{1,}\s)CPF

But you’re only taking the name, you didn’t get the -1505.

1 answer

3


The problem with its regular expression is that the character set [A-Za-z0-9\s] does not contain the character -. Then the regular expression ends up in the next hyphen 1505 .

The problem is, by adding the hyphen to that set, the regular expression goes giving match all the way to the beginning of the string.

A slightly simpler solution is to give match in all that nay be a line break character, like this:

([^\n]+)\nCPF

Thus, it is given match on the entire line that is above the line that starts with CPF. For example:

ESTADO DE SANTA CATARINA
Contra-cheque individual
Administracao dos Pensionistas do IPREV
1505 - ADMINISTRACAO DOS PENSIONISTAS DO IPREV
CPF: 000.000.000-00
Matrícula: 0000000000
Inscrição: 00000000

See working here.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.