How to find and change a pattern without replacing the other elements of a String with regex?

Asked

Viewed 168 times

1

I have a template qualquer nome:{{UDA_numero qualquer}} that comes "loose" in a String. In that String, there can be "n" templates, words, numbers, etc.

The goal is to catch this "any number" from the template and process it in a series of operations to replace with a description corresponding to the number. Still, it would be necessary to ignore all the other elements of the String and replace only the template, keeping all other characters unchanged.

Ex:

     String: "teste:{{UDA_1}} teste2:{{UDA_2}} teste3:{{UDA_3}} "

     String(processada): "teste:descriacao_1 teste2:descricao_2 teste3:descricao_3

The problem is that String of origin can come in any way possible, for example:

    Origem: "teste:{{UDA_1}}, teste2:{{UDA_2}}, teste3:{{UDA_3}}..."
    Processado: "teste:descricao_1, teste2:descricao_2, teste3:descricao_3..."

    Origem: "teste:{{UDA_1}} \n teste2:{{UDA_2}} \n teste3:{{UDA_3}} \n"
    Processada: "teste:descricao_1 \n teste2:descricao_2 \n teste3:descricao_3 \n"

    Origem:  "teste:descricao_1teste2:descricao_2teste3:descricao_3"
    Processada: "teste:{{UDA_1}}teste2:{{UDA_2}}teste3:{{UDA_3}}"

    Origem:"teste:{{UDA_1}} abc teste2:{{UDA_2}} 123 teste3:{{UDA_3}} ^~{{}}"
    Processada:  "teste:descricao_1 abc teste2:descricao_2 123 teste3:descricao_3 ^~{{}}"

    Origem: "teste:{UDA_1}, teste2:{{UDA~~~2}}, teste3:{{3_UDA}}"
    Processada: "teste:{UDA_1}, teste2:{{UDA~~~2}}, teste3:{{3_UDA}}"
   // (Template está errado - não substitui).

Making it necessary to search for the template pattern in order to replace correctly. The way I was trying today, was using regex with the following idea:

         // Padrão para acessar somente os templates
         Pattern p = Pattern.compile("(\\{\\{(.\\w+.\\w.)\\}})+",Pattern.DOTALL);
         // String recebida
         Matcher m = p.matcher(ImportDescriptionValue);                

         // Sempre que encontrar o valor correspondente
         while (m.find()) {

             // Pega somente a parte de dentro (ex: UDA_1)           
             String uda = m.group(2);   

             // Formatar String para pegar somente o id
             String idUDA = uda.substring(uda.indexOf('_')+1);

             // **   
                ... operações com o ID

                if(encontrou descrição correspondente)

                 // Altera o atual pela descrição
                  m.replaceFirst(description);           
             }       
             else {

                 //Replace por "vazio" quando não encontrar.    
                  m.replaceFirst("");
             }
         }       

      // String processada.
      System.out.printl(m);

The code is wrong, but the idea would be more or less this. I have a solution that can solve by substituting right through the split(), but due to all these possible variations, it is very limited. So I was trying to use other approximations to the problem, like regex for example.

My question:

  • Regex is a good way to deal with this problem?
  • Is there any good/optimized way to solve this problem?
  • In java the string replaceAll method supports a regex. nor would it need all that there not.

  • Yes, but each template has a different description... if I used replaceAll() they would all have the same description, wouldn’t they? I would need separate replace() for each template to have its description.

1 answer

2


From what I understand, you just want to replace {{UDA_x}} for descricao_x (where "x" is a number), and vice versa.

If the parties {{UDA_ and }} (or descricao_) are always fixed and what changes is only the number, just do:

String[] textos = {
        "teste:{{UDA_1}}, teste2:{{UDA_2}}, teste3:{{UDA_3}}...",
        "teste:{{UDA_1}} \n teste2:{{UDA_2}} \n teste3:{{UDA_3}} \n",
        "teste:descricao_1teste2:descricao_2teste3:descricao_3",
        "teste:{{UDA_1}} abc teste2:{{UDA_2}} 123 teste3:{{UDA_3}} ^~{{}}",
        "teste:{UDA_1}, teste2:{{UDA~~~2}}, teste3:{{3_UDA}}" };
for (String texto : textos) {
    String processado;
    if (texto.indexOf("{{UDA_") >= 0) { // tem "{{UDA_" na String
        processado = texto.replaceAll("\\{\\{UDA_(\\d+)\\}\\}", "descricao_$1");
    } else {
        processado = texto.replaceAll("descricao_(\\d+)", "{{UDA_$1}}");
    }
    System.out.println("Origem: " + texto);
    System.out.println("Processado: " + processado);
}

You don’t need to use . (which corresponds to any character, and with the option DOT_ALL, it also corresponds to line breaks), and nor the shortcut \w, which corresponds to letters, numbers and the character _.

The ideal is that you be as specific as possible. In this case, I put the characters themselves { (which in regex should be written as \{, but as it is in a String, the character \ is written as \\).

Then I use \\d+ (one or more digits from 0 to 9). If your cases can only have a single digit, simply remove the +.

Also, I put the numbers in parentheses to form a capture group. With this, I can recover the respective value using the reference $1 in the second parameter of the method replaceAll. Since regex only has a pair of parentheses, its content (the digits) will be in the first capture group, which can have its value recovered through $1.

That is to say, texto.replaceAll("\\{\\{UDA_(\\d+)\\}\\}", "descricao_$1"), check if we have {{UDA_, followed by one or more numbers followed by }}. If found, this section is replaced by descricao_$1, whereas $1 is the number that was captured earlier.

The same goes for the second replaceAll, which does the opposite: detects descricao_ followed by one or more digits and exchange for {{UDA_$1}}, whereas $1 is the number that was captured.


This solution is limited to cases where only one of the two occurs (or only has {{UDA_x}}, or only has descricao_x in String): note the use of indexOf to see if there is {{UDA_ in String.

But if you have a text with occurrences of both mixed, then it is no use to make a single replaceAll. In this case, the way is to go through the String and exchanging the occurrences one by one:

String[] textos = { "teste:{{UDA_1}}, teste2:{{UDA_2}}, teste3:{{UDA_3}}...",
        "teste:{{UDA_1}} \n teste2:{{UDA_2}} \n teste3:{{UDA_3}} \n",
        "teste:descricao_1teste2:descricao_2teste3:descricao_3", 
        "teste:{{UDA_1}} abc teste2:{{UDA_2}} 123 teste3:{{UDA_3}} ^~{{}}",
        "teste:{UDA_1}, teste2:{{UDA~~~2}}, teste3:{{3_UDA}}",
        "teste:{{UDA_1}}, teste2:descricao_2" }; // string que mistura os 2 casos
Matcher matcher = Pattern.compile("\\{\\{UDA_(\\d+)\\}\\}|descricao_(\\d+)").matcher("");
for (String texto : textos) {
    matcher.reset(texto); // seta o texto no Matcher
    StringBuffer sb = new StringBuffer();
    while (matcher.find()) {
        if (matcher.group(1) != null) { // encontrou "{{UDA_x}}"
            matcher.appendReplacement(sb, "descricao_$1");
        } else if (matcher.group(2) != null) { // encontrou "descricao_x"
            matcher.appendReplacement(sb, "{{UDA_$2}}");
        }
    }
    matcher.appendTail(sb);
    System.out.println("Origem: " + texto);
    System.out.println("Processado: " + sb.toString());
}

Now the regex uses alternation (the character |), which means or. That means the regex tests for {{UDA_x}} or descricao_x. In each, the digits are in parentheses, so each one forms a capture group.

Then just test which of the groups was captured. If it was group 1, it means that regex found an occurrence of {{UDA_x}}, and just exchange for descricao_$1. If it was Group 2, that means it was found descricao_x, and just exchange for {{UDA_$2}} (notice that I used $2, because now the digits are in the second capture group - since it is the second pair of parentheses of the regex).


If you want, you can still check if what you have before the template is "a name followed by two dots":

Matcher matcher = Pattern.compile("(?<=\\w+:)(?:\\{\\{UDA_(\\d+)\\}\\}|descricao_(\\d+))").matcher("");

I use a lookbehind, that only checks if something exists before, but that something is not part of the match, and is therefore not replaced. In this case, the lookbehind is (?<=\\w+:) (one or more letters/numbers/_, followed by two points).

Then I group the rest of the regex in parentheses, but so that it does not become a capture group (and interfere with the existing groups 1 and 2), I use (?:, which turns the parentheses into a catch group (that is, this pair of parentheses does not create the special variables, such as $1 and $2). So the rest of the code can remain the same, without having to change the number of groups.

That is, if you test this regex with the string "teste:{{UDA_1}}, teste2:descricao_2, {{UDA_3}}", the stretch {{UDA_3}} will not be replaced as it does not have a name (one or more letters, numbers or _) followed by : just before him.

If you want to be more specific, you can change the \\w+ for something like [a-zA-Z]+ (one or more letters of a to z, uppercase or lowercase). Adapt according to what you need.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.