Remove String Connectors with Regular Expression

Asked

Viewed 1,446 times

6

How could I remove connectors: "e", "do", "da", "do", "das", "de", "di", "du". From a sentence without changing the whole name.

Example the name: Daniela de Andrade. I wanted to remove only "from", without removing the "DA"niela de Andra"DE" I’m using the relaceAll function in java.

    String retiraConector = "^\\s e $\\s";

   nome = nome.replaceAll(retiraConector, " ");
  • As you saw on the [tour], if you think the main goal of this question has been achieved, you can accept an answer. You can also vote for everything on the site you find useful, not just things related to your posts.

2 answers

9

You can use the following pattern:

String padrao = "(\\w)(\\s+)(e|do|da|do|das|de|di|du)(\\s+)(\\w)";

This pattern was divided into five groups, these follow the order:

any letter or number + one or more spaces + connector + one or more spaces + any letter or number

Note: the groups are formed through parentheses.

To make the replacement use:

public static void main(String[] args) {
    String padrao = "(\\w)(\\s+)(e|do|da|do|das|de|di|du)(\\s+)(\\w)";
    String nome = "Daniela de Andrade";
    System.out.println(nome.replaceAll(padrao, "$1 $5"));
}

The result is as follows:

Daniela Andrade

When you use the replaceAll, the pattern is found in Daniel[a de A]ndrade, and is replaced by groups 1 and 5, which are separated by a blank space, these groups are represented by the to, of Danielto, and the To, of Tothoroughness.


Overhaul

To ignore upper and lower case letters, you can use (?i) in its expression, for example:

String padrao = "(?i)(\\w)(\\s+)(e|do|da|do|das|de|di|du)(\\s+)(\\w)";

The way to perform the substitution is the same as indicated above.

  • Oops, thank you, solved my problem, I added [dd][ee] to recover upper and lower.

  • @Gabriel decided, do not forget to accept the answer. See on[tour].

  • @Gabrielfaria improved my answer to also contemplate this type of verification.

1

Following @Mateus Alexandre’s reply:

You can use the following pattern as well:

String padrao = "\s(e|d(a|e|i|o|u)s?)\s";

This pattern was divided into two groups, these follow the order:

space + connector + space

only by changing:

System.out.println(nome.replaceAll(padrao, "$1 $5"));

for

System.out.println(nome.replaceAll(padrao, " "));
  • I believe a bar is missing from the expression (\s for \\s), nay?

  • @Mateusalexandre, actually there is no need, in regular expression \s is a meta sequence (a shortcut), some programmers use \\s, but I don’t know how to explain the exact reason, for logic could be an attempt at literal interpretation. example \\W that could capture the $ which is a special ending character, thus interpreting literally \$, however in \\w, could capture a t, getting \t, which is also a meta sequence. For testing I recommend the site : http://regex101.com/

  • In a quick search I found this: "In a regular expression defined with the new constructor, to use a metafrequency starting with the backslash character () as d (which corresponds to any digit), type twice this backslash character:" var pattern:RegExp = new RegExp("\\d+", "");, namely the \\ is actually being used to interpret the string as \, getting \d.

  • yes, exactly, I said that because I tested it in Netbeans and Eclipse and both accused it as invalid exhaust sequence

Browser other questions tagged

You are not signed in. Login or sign up in order to post.