Regular expression to find numbers in between words

Asked

Viewed 2,215 times

2

Currently I am developing a project of which I am using regular expressions to find certain patterns, however there is a String specific and need to extract two numbers from this, the expression is like this:

Agência: 0000 Conta: 00000-0

I need to extract the numbers in between Strings, someone can help me?

  • You can replace everything that is not typed with anything. . replaceAll("[ 0-9]", "");

  • @arllondias ai he will mix the number of the agency and the account in one. The ideal would be to group the two numbers and ignore the rest.

  • The only problem is that all the rest of the document is saved in a list of Strings, each String being a line, and among these are other values that are not digits either, so do not replace them with anything.

  • The order is always agency and account or may vary?

  • Vary friend, just want to extract the numbers, regardless of the order where I will store them.

  • @Matheusgrossi When you say it varies, does it vary how exactly? Can the account come before the agency? Can only one of them come? Can more than one account and one agency come? They may be separated into distant parts of the document?

  • The order of the data only, one can come before the other and vice versa.

  • pq n does only (\d+) and then house the length results to know which result is which? type here

Show 3 more comments

2 answers

10


The regular expression is:

(?:Agência: [0-9]{4} Conta: [0-9]{5}-[0-9X])|(?:Conta: [0-9]{5}-[0-9X] Agência: [0-9]{4})

Basing myself in that other answer of mine:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

class TesteRegex {

    private static final Pattern AGENCIA_CONTA = Pattern.compile(
            "(?:Agência: [0-9]{4} Conta: [0-9]{5}-[0-9X])|" +
            "(?:Conta: [0-9]{5}-[0-9X] Agência: [0-9]{4})");

    public static void main(String[] args) {
        String texto = ""
                + "Banana abacaxi pêra Agência: 5720 Conta: 43821-X abacate "
                + "melancia Agência: 3481 Conta: 53895-0. verde azul "
                + "amarelo Agência: 6666 Conta: 66667-NÃO É ESSA "
                + "Agência: 9123 Conta: 44578-2 "
                + "laranja Conta: 43210-7 Agência: 6589 verde "
                + "rosa lilás Conta: 77777-7 Não vai dar Agência: 4444";

        Matcher m = AGENCIA_CONTA.matcher(texto);
        while (m.find()) {
            String achou = texto.substring(m.start(), m.end());
            System.out.println("Achou nas posições " + m.start() + "-" + m.end() + ": "
                    + achou);
            String agencia, conta;
            if (achou.startsWith("Agência:")) {
                agencia = achou.substring(9, 13);
                conta = achou.substring(21, 28);
            } else {
                agencia = achou.substring(24, 28);
                conta = achou.substring(7, 14);
            }
            System.out.println("Os valores encontrados são: " + agencia + " e " + conta + ".");
        }
    }
}

Here’s the way out:

Achou nas posições 20-48: Agência: 5720 Conta: 43821-X
Os valores encontrados são: 5720 e 43821-X.
Achou nas posições 66-94: Agência: 3481 Conta: 53895-0
Os valores encontrados são: 3481 e 53895-0.
Achou nas posições 153-181: Agência: 9123 Conta: 44578-2
Os valores encontrados são: 9123 e 44578-2.
Achou nas posições 190-218: Conta: 43210-7 Agência: 6589
Os valores encontrados são: 6589 e 43210-7.

See here working on ideone.

Explanation of regex, starting with the general structure:

  • (?: ... :) - No-catch group.
  • aaa|bbb - Choice between aaa and bbb. He gives match in the first of them they find.
  • (?: ... :)|(?: ... :) - Choose between two groups without capture.
  • Agência: [0-9]{4} Conta: [0-9]{5}-[0-9X] - First group.
  • Conta: [0-9]{5}-[0-9X] Agência: [0-9]{4} - Second group.

Explanation of codes in groups:

  • [0-9]{4} - Four digit occurrences between 0 and 9. This is the agency number.
  • [0-9]{5} - Five digit occurrences between 0 and 9. This is part of the account number.
  • - - The hyphen. This is part of the account number.
  • [0-9X] - A digit from 0 to 9 or an X. This is part of the account number.

The rest (including spaces) is explicit text that is only recognized exactly as it is.

regex therefore searches for agency before account or account before agency, accepting both forms. With if I identify which form is found and retreat using substring the agency and account digits.

When there is some other text in the middle of the agency and the account or when the following number is incomplete, it will not be recognized.

  • AP just confirmed that varies the position of the agency and account. Although find strange, he said exactly this

  • I found it strange to escape the hyphen, he has no special understanding outside lists and denied lists

  • 1

    @Jeffersonquesado Escape the hyphen was really unnecessary, already took. As for the position that varies, I will think of something.

  • @Jeffersonquesado And now, what do you think?

  • 1

    was very good!

  • 1

    Very show the answer!

  • 1

    Quite a complete answer..

  • 1

    Tested with https://regex101.com/ and worked perfectly, with Sublime Text 3 also worked Perfect, congratulations!

Show 3 more comments

-2

With this regex, you can recover these values through the Groups property.

\p{L}+:\s*(?<Agencia>\d{4})\s*\p{L}+\:\s*(?<Conta>\d{5}\-\d+)
  • 5

    As it has the java tag, it would be interesting and recommended to show the application of this with java.

  • 4

    Don’t take this the wrong way, but the guy comes to me for help and you tell him to read another link for help? This is nothing legal and does not contribute to the independence and quality of the site. Think about it...

  • I don’t know anything about java, but I know regex, and this one, perfectly meets what he wants, I tried to help only.

  • But java, like other languages, has features as to how to use regex and the question has the java tag. Soon your answer is incomplete, just look at the other answers.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.