In fact, the two regex you indicated do not return the same result. I ran a test on JDK 1.7.0_80, and it is also possible to see them working (differently) here and here.
I created a very simple method to test a regex:
public void testregex(String input, String regex) {
Matcher matcher = Pattern.compile(regex).matcher(input);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
}
Then I tested the same input using the two regex (detail that the \
must be escaped, so it is written as \\
):
String input = "Detalhamento de Serviços nº: 999-99999-9999";
testregex(input, "Detalhamento de Serviços.+(\\d+-\\d+-\\d+)");
testregex(input, "Detalhamento de Serviços\\D+(\\d+-\\d+-\\d+)");
The result was:
9-99999-9999
999-99999-9999
This is because the quantifiers +
and *
are "greedy" and try to get as many characters as possible. In the first case, it also takes the first two digits 9
, because the rest of the String
(9-99999-9999
) also satisfies the last part of regex (\d+-\d+-\d+
).
In the second case, he doesn’t take the first two 9
because \D
ensures it won’t pick up digits.
Therefore, some possible solutions are:
- Use the
\D
: so you guarantee that, as much as the quantifier is greedy, it won’t pick up a digit by mistake
- Use a
?
right after the quantifier +
, for that cancels the "greedy behavior". The regex looks like this: Detalhamento de Serviços.+?(\d+-\d+-\d+)
- note the use of .+?
to remove the "greed"
- Set the number of digits using
{}
. For example, if the number of digits is always "3-5-4", you can use Detalhamento de Serviços.+?(\d{3}-\d{5}-\d{4})
. If the number of digits varies, use the syntax {min,max}
. For example, if there is a 2-digit minimum and a 3-digit maximum, use {2,3}
(and use the "cancel of greed", or the \D
to ensure). Adapt according to your need.
Taking into account only the name, "any Character" is literally any character, whereas "non-digit" is any character except numbers, is it not? So they’re not the same.
– StatelessDev
Yes, but to search for the input I put as example, both serve and return the desired result. The question is: which is the right one to use and pq?
– Gustavo Piucco