The regular expression is:
(?:Agência: [0-9]{4} Conta: [0-9]{5}-[0-9X])|(?:Conta: [0-9]{5}-[0-9X] Agência: [0-9]{4})
Basing myself in that other answer of mine:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class TesteRegex {
private static final Pattern AGENCIA_CONTA = Pattern.compile(
"(?:Agência: [0-9]{4} Conta: [0-9]{5}-[0-9X])|" +
"(?:Conta: [0-9]{5}-[0-9X] Agência: [0-9]{4})");
public static void main(String[] args) {
String texto = ""
+ "Banana abacaxi pêra Agência: 5720 Conta: 43821-X abacate "
+ "melancia Agência: 3481 Conta: 53895-0. verde azul "
+ "amarelo Agência: 6666 Conta: 66667-NÃO É ESSA "
+ "Agência: 9123 Conta: 44578-2 "
+ "laranja Conta: 43210-7 Agência: 6589 verde "
+ "rosa lilás Conta: 77777-7 Não vai dar Agência: 4444";
Matcher m = AGENCIA_CONTA.matcher(texto);
while (m.find()) {
String achou = texto.substring(m.start(), m.end());
System.out.println("Achou nas posições " + m.start() + "-" + m.end() + ": "
+ achou);
String agencia, conta;
if (achou.startsWith("Agência:")) {
agencia = achou.substring(9, 13);
conta = achou.substring(21, 28);
} else {
agencia = achou.substring(24, 28);
conta = achou.substring(7, 14);
}
System.out.println("Os valores encontrados são: " + agencia + " e " + conta + ".");
}
}
}
Here’s the way out:
Achou nas posições 20-48: Agência: 5720 Conta: 43821-X
Os valores encontrados são: 5720 e 43821-X.
Achou nas posições 66-94: Agência: 3481 Conta: 53895-0
Os valores encontrados são: 3481 e 53895-0.
Achou nas posições 153-181: Agência: 9123 Conta: 44578-2
Os valores encontrados são: 9123 e 44578-2.
Achou nas posições 190-218: Conta: 43210-7 Agência: 6589
Os valores encontrados são: 6589 e 43210-7.
See here working on ideone.
Explanation of regex, starting with the general structure:
(?: ... :)
- No-catch group.
aaa|bbb
- Choice between aaa
and bbb
. He gives match in the first of them they find.
(?: ... :)|(?: ... :)
- Choose between two groups without capture.
Agência: [0-9]{4} Conta: [0-9]{5}-[0-9X]
- First group.
Conta: [0-9]{5}-[0-9X] Agência: [0-9]{4}
- Second group.
Explanation of codes in groups:
[0-9]{4}
- Four digit occurrences between 0 and 9. This is the agency number.
[0-9]{5}
- Five digit occurrences between 0 and 9. This is part of the account number.
-
- The hyphen. This is part of the account number.
[0-9X]
- A digit from 0 to 9 or an X. This is part of the account number.
The rest (including spaces) is explicit text that is only recognized exactly as it is.
regex therefore searches for agency before account or account before agency, accepting both forms. With if
I identify which form is found and retreat using substring
the agency and account digits.
When there is some other text in the middle of the agency and the account or when the following number is incomplete, it will not be recognized.
You can replace everything that is not typed with anything. . replaceAll("[ 0-9]", "");
– arllondias
@arllondias ai he will mix the number of the agency and the account in one. The ideal would be to group the two numbers and ignore the rest.
– user28595
The only problem is that all the rest of the document is saved in a list of Strings, each String being a line, and among these are other values that are not digits either, so do not replace them with anything.
– Matheus Grossi
The order is always agency and account or may vary?
– Jefferson Quesado
Vary friend, just want to extract the numbers, regardless of the order where I will store them.
– Matheus Grossi
@Matheusgrossi When you say it varies, does it vary how exactly? Can the account come before the agency? Can only one of them come? Can more than one account and one agency come? They may be separated into distant parts of the document?
– Victor Stafusa
The order of the data only, one can come before the other and vice versa.
– Matheus Grossi
pq n does only
(\d+)
and then house thelength
results to know which result is which? type here– guijob