Ignore scoring in a sequence of numbers using Regex with Java

Asked

Viewed 648 times

3

I have the following input:

Fatura Cliente:  1.7852964.34 
CPF/CNPJ:  09022317000222

I need to take only the "Customer Invoice" numbers, ignoring scores, returning only 1785296434, for this I am using the following regex:

Fatura Cliente[\D]+(\S+)

But later I need to treat and replace the scores to turn into a sequence of numbers.

How do I get regex to give me a sequence of numbers ignoring scores by capture group without having to replace the code later?

Capturing by the first regex already formatted without the scores is possible or necessary to give a String.replace or String.replaceAll(regex) after the first capture with regex?

2 answers

3


Regex

The Regex: \b(\d+)(\.|,|\b)

Upshot

With these test strings:

Fatura Cliente: 6.823935.10
Fatura Cliente: 6,823935,10
Fatura, Cliente: 6,823935.10
Fatura. Cl1ente: 6.823935,10

And replace it with: $1 , where $1 means the first capture group.

The following results are obtained:

Fatura Cliente: 682393510
Fatura Cliente: 682393510
Fatura, Cliente: 682393510
Fatura. Cl1ente: 682393510

Test demo on Regexplanet or Freeformatter

Explanation \b(\d+)(\.|,|\b)

  • \b - The position of a word limit, that is, the letter cannot be followed by another letter.
  • First Group of Capture (\d+)
    • \d - Corresponds to digit equal to [0-9]
    • + - Quantifier that corresponds from one to unlimited times, as many times as possible (Greedy)
  • Second Capture Group (\.|,| |$)
    • | - Or
    • \. - It literally corresponds to the end point
    • , - Corresponds literally to the comma
    • \b - The position of a word limit, that is, the letter cannot be followed by another letter.

EDIT:

You wouldn’t need Regex, since it’s a captured Substring, only replacing the semicolons with replace would solve your problem.

It is not possible to do only with Regex. You would need an extra step to handle this, either with replace or with another method. There are some modes in the @Douglas.

  • It is a substring capturing a string using regex, and in this capture regex of this substring already wanted the sequence of formatted numbers to come, without points and commas. I capture the substring using regex "Customer Invoice[ D]+(? <clientNumber> S+" and matcher.group("clientNumber") is "6.823935.10", but I need it already formed without points and commas: "682393510", understands?

  • Have you tried using replace on each Submatch? I have no experience in java, but it would be something like: String TirarPonto = SubMatchDoRegex.replace(".", "");

  • At first I wanted capture group matcher.group("clientNumber") to return with it formatted by regex itself. And yes, this is the way I’m using today, it serves me for now, but the doubt is more curious than need, and also to make the code cleaner in the future, because I use very regex.

  • 1

    You would need an extra step to handle this. It is not possible to do only with Regex.

  • Perfect. Basically what I needed to know was that it is not possible, so the only output then is to format with a String.replace or string.replaceAll(regex) after capturing the string with the numbering + score. I will edit the initial question, then you can edit your answer and inform that it is not possible to capture it formatted directly. This way I can mark how accepted to help future users who come to have the same question.

2

Java implementation

public static void main(String[] argvs) {
    // Com ponto
    String numeroSemPonto = extraiNumeracao("Fatura Cliente: 6.823935.10");
    System.out.println(numeroSemPonto);

    // Com vírgula
    String numeroSemVirgula = extraiNumeracao("Fatura Cliente: 6,823935,10");
    System.out.println(numeroSemVirgula);

    //** Outra opção **//

    // Com ponto
    String numeroSemPonto2 = extraiNumeracao2("Fatura Cliente: 6.823935.10");
    System.out.println(numeroSemPonto2);

    // Com vírgula
    String numeroSemVirgula2 = extraiNumeracao2("Fatura Cliente: 6,823935,10");
    System.out.println(numeroSemVirgula2);
}

public static String extraiNumeracao(String str) {
    Pattern p = Pattern.compile("\\d+");
    Matcher m = p.matcher(str);
    String resultado = "";

    while (m.find()) {
        resultado += m.group();
    }

    return resultado;
}

// Outra opção
public static String extraiNumeracao2(String str) {
    return str.split(": ")[1].replace(",", "").replace(".", "");
}
  • I am using Java. regex " d+" returns only the numbers before the first "." whether it would be "6" in this case. I edited with more information.

  • 1

    I put two methods, one using Regex and the other using 'replace' as suggested by @danieltakeshi

Browser other questions tagged

You are not signed in. Login or sign up in order to post.