Regex to capture unique numbers within a string

Question

Regex to capture unique numbers within a string

Asked 5 years, 7 months ago

Viewed 268 times

1

I’m trying to create a Regex that captures only unique numbers in the middle of other characters (as long as they follow the condition of not being followed by other numbers).

Examples:

"R3g3x": ["3","3"]
"J4v4scr1pt": ["4","4","1"]
"1nício": ["1"]
"Fina1": ["1"]
"Dup10": []

For now what I managed was to capture the numbers inside the words, but my Regex still does not restrict the sequence by another number.

It’s like this: \d+(?=\w)(?!\d)

Does anyone have any idea how I should proceed?

Just to make a point, your last example shouldn’t be: "Dup10": [0]?

– anonimo

2020/05/12 at 13:43

1 answer

Browser other questions tagged regex

You are not signed in. Login or sign up in order to post.

by hkotsubo • **55,826** points · Answer 1 · 2020-05-12T13:54:09+00:00

\d+ captures one or more digits (+ is a quantifier meaning "one or more occurrences"), so if the string is abc10xyz, he will capture the 10. If the idea is to capture only a single digit, remove the +.

Anyway, from what I understand you want a digit as long as you have no other digit before or after. So you can use Lookahead and lookbehind negative:

(?<!\d)\d(?!\d)

So it only takes single digits, which have no other digit before or after (i.e., the digit can also be at the beginning or end of the string). And if it has two or more digits in a row (as in the case of "Dup10"), it does not take any (if you use \d+, the "10" of "Dup10" will be captured).

When using the Lookahead (?=\w), regex is demanding that there be some character in front (letters, numbers or _), so it doesn’t work if the number is at the end of the string, or if the character that appears after is not equivalent to \w. But how I’m using Lookahead and lookbehind negative (i.e., checks whether something nay exists in front or behind), this also serves for digits at the beginning or at the end (because if the digit is at the beginning, then it does not have a digit before, and if it is at the end, it does not have a digit after).

See here the regex working.

If you just want the digits that don’t have another digit after, then just remove the lookbehind:

\d(?!\d)

But in this case, in "Dup10", the regex will match in the 0.

If the engine cannot bear lookbehind (since this usually has less support than Lookahead), an alternative is to check if the digit is at the beginning of the string (with the bookmark ^) or if before it has a character that is not digit (\D):

(?:^|\D)(\d)(?!\d)

I use alternation (the character |, meaning "or") to check the start of the string (^) or a character that is not a digit (\D). Only now the character that comes before the digit is also part of the match. But as the digit is in parentheses, it is in a capture group, so check if the tool/engine has a way of catching only the group.

As there are only one set of parentheses ((\d)), he will be group 1 (see here, on the right side at Match Information the contents of "Group 1").

The (?: creates a non-sampling group (i.e., this pair of parentheses does not create a group).

What if the engine also not bear Lookahead, you can use a similar idea: check if what you have next is \D or the end of the string ($):

(?:^|\D)(\d)(?:\D|$)

Only now both the character that is before and what is after will be part of the match, then you need to access the capture group to pick only the digit.

Another option (which may also not be supported at all Engines) is the shortcut \K:

(?:^|\D)\K\d(?!\d)

What it does is discard everything that has been found so far (in this case, the beginning of the string or the character that is not a digit), so only the \d will be part of the match.