In the comments you said you’re using preg_match
. And if we look at documentation, it is said that in the results of pouch capture groups are also returned.
In the case, catch groups are created by parentheses, and you use many of them in your regex. There is one group only for the day of the week without the suffix "-feira", another only for the suffix "-feira", another for the numbers in brackets, another around the whole expression, etc.
If you don’t want so many groups and just need the match whole, just turn the parentheses into catch groups, starting them with (?:
. The parenthesis around the entire expression is also not necessary, since the section corresponding to the whole regex is always returned. Then it would look like this:
/(?:(?:segunda|terça|quarta|quinta|sexta)(?:-feira)?|sábado|domingo)(?:\s+\([0-9]+\))?/i
In PHP code would look something like:
$str = 'Espetáculo Trueque se apresenta neste sábado no teatro da CDL'.
'Bandas de rock se apresentam nesta sexta (5) no espaço Marcus Moraes'.
'Loja de decoração inaugura novo espaço nesta terça-feira (15)';
preg_match_all('/(?:(?:segunda|terça|quarta|quinta|sexta)(?:-feira)?|sábado|domingo)(?:\s+\([0-9]+\))?/i', $str, $matches);
foreach ($matches[0] as $m) {
echo $m.PHP_EOL;
}
Exit:
sábado
sexta (5)
terça-feira (15)
Now all parentheses start with (?:
, which makes them no longer capture groups. Therefore the respective excerpts are no longer part of the match, only the part corresponding to the whole regex.
Other improvements:
(?:(?:segunda|terça|quarta|quinta|sexta)(?:-feira)?|sábado|domingo)
: the alternation says that only Monday to Friday can have a "Friday" after (the ?
indicates that the phrase "Friday" is optional). Saturday and Sunday may not have the suffix "Friday".
\s+
: the shortcut \s
corresponds to spaces and line breaks, among other characters (the exact list varies according to the language) and the quantifier +
means "one or more occurrences". That is, it may have one or more spaces
[0-9]+
: one or more digits from 0 to 9. Here you could even use something like (?:3[01]|[12][0-9]|0?[1-9])
to accept only values between 1
and 31
(which are the values valid for the day of the month - being that the days less than 10 may have a zero left, ie accept both 1
how much 01
, see), is at your discretion.
(?:\s+\([0-9]+\))?
: the entire "space + numbers in parentheses" section has a ?
soon after, which makes this section optional
In your regex you were wearing the flags i
and m
(at the end, the /im
). To flag
i
makes the regex case insensitive, then if the string has "Monday" or "SUNDAY", it also finds. Already the flag m
changes the behavior of markers ^
and $
(usually correspond to the beginning and end of the string, but with the flag m
they change the meaning to the beginning and end of a line). As you do not use these markers, I removed the flag m
of regex.
In the above example I used preg_match_all
, that brings all occurrences of the string, but if you want to search for only one occurrence, use preg_match
:
preg_match('/(?:(?:segunda|terça|quarta|quinta|sexta)(?:-feira)?|sábado|domingo)(?:\s+\([0-9]+\))?/i', $str, $matches);
foreach ($matches as $m) {
echo $m;
}
Testing here seemed normal: http://www.regexr.com/ How is it running?
– Pedro Lorentz
How do you read the results?
– Leonel Sanches da Silva
I am running as preg_match. I just thought maybe I could make it lighter. Since I am using many Group '()'. I’ve read about the group being inversely proportional to performance.
– Iago Leão
When you say "less 'greedy'" do you mean in the figurative or literal sense? For there is a concept in regex called Greedy (in Portuguese, gluttonous [in this context; lit. greedy]), whose use or use would not affect its results (e.g., gluttonous: "Tuesday (15)", lazy / Lazy: "tuesday"). From what I understand from your previous comment, this is not what you refer to, so I suggest changing the title of the question with an alternative expression ("lighter", or "more concise" seems a good one) not to cause ambiguity.
– mgibsonbr
I would suggest to you that you change your question by adding an important detail: which answer you waiting get from Regex, for all possible cases? With examples or not, but this is something that is not so clear in your question. With the phrase "Rock bands perform on Friday (5) in space Marcus Moraes" is desired ONLY "sixth (5)" as a result?
– Androf
He showed all combinations and results - it’s in the question
– Papa Charlie
@mgibsonbr I know it’s been a long time, but I’ve never noticed this question, perhaps by name. I set the title, whoever uses regex knows what a group is, so "reduce the capture of groups" it seems appropriate to me for those who come for searchers, but if you have another idea it will help a lot.
– Guilherme Nascimento