Regex capturing exactly x digits of a string

Asked

Viewed 1,371 times

5

I’m trying to make a regular expression in which it captures exactly x digits(specifically in my case x=6).

Example:

"test 1 n°123456 test end"
"2 n°7890123 test end"

I want you to return in the regular expression only the "123456", ie only numbers that contains exactly 6 digits. How could you do it?

2 answers

8


You can use the following regex:

\b\d{6}\b

The \d is the meta character for digit, looking for a digit from 0 to 9.

The {6} indicates that you must find 6 of these in a row.

The \b indicate that it is beginning and end of word, which implies that it has to be the 6 numbers in a row, making 7 or more numbers no longer valid.

See working on regex101

It works perfectly for the case you mentioned. If you need something broader that works even with the numbers followed by the text, you can use this regex, for example, which is already a little more complex:

^|\D(\d{6})\D|$

Explanation:

^|\D    - Inicio de linha ou algo que não seja um digito
(\d{6}) - 6 dígitos seguidos e a serem capturados no grupo de captura 1
\D|$    - Seguido de algo que não seja um digito ou fim de linha

Here I also used the meta character \D which is the reverse of \d, and it means something other than a digit.

A subtle difference from this regex to the previous one is that in this one the digit is in the capture group 1, instead of in the complete capture of the regex. So no matter how you use regex in code, the number will always be different.

See in regex101

In regex territory it’s always like this. The simplest general rule is more strict and does not control all cases, but sometimes they are exactly what you need. Always try to find a balance between what you want to validate and the complexity/efficiency you are willing to use/give up.

  • 2

    The more I study regex, the more I realize that what you said in the last paragraph is true: finding the balance between accuracy and complexity is fundamental - and it is often difficult to find this equilibrium point. If the system allowed, I would give one more vote just for this comment :-) By the way, another alternative is to use lookaheads and negative lookbehinds: (?<!\d)(\d{6})(?!\d) - see

  • 2

    @hkotsubo It is so. There are many cases where it simply doesn’t compensate for the complexity and loss of efficiency to control one or another case that almost never happens, with monstrous regexs like I’ve seen some. Interestingly I started by doing exactly this regex with negative Lookahead, but I ended up switching to what I have in the answer because I thought it was simpler. In addition some Engines are more restrictive in lookbehind

3

Great response from @Isac applying the meta-character \b, however, using the \b will only serve in your specific case. I would like to leave here a more generic alternative.

The \b works in your case because the 6 digits are preceded by a special character ° and succeeded by a space, that is, the catch string between two \b's will only be found if it is between two special characters (except underlined _) or space.

For example, if the first string cited in the question did not have the ° before the number (if so: n123456), the number 123456 would not be found with \b. Behold:

var str = "teste 1 n123456 fim do teste";
console.log(/\b\d{6}\b/.test(str));

That’s because the \b, as stated, requires the string searched to be between special characters.

My suggestion is not the most elegant, but will return the number (regardless of where you are) that contains 6 digits.

Behold:

var string1 = "teste 1 n123456 fim do teste";
var string2 = "teste 2 n°7890123 fim do teste";

var teste1 = string1.match(/\d{6,}/);

if(teste1 && teste1[0].length == 6){
   console.log("'"+ teste1[0] +"' encontrado na string1");
}

var teste2 = string2.match(/\d{6,}/g);

if(teste2 && teste2[0].length == 6){
   console.log(teste2[0]);
}else{
   console.log("Nada encontrado na string2");
}

What the code does? He’s looking for a number with 6 or more digits and then make the comparison if that number has only 6 characters; if it has, OK, otherwise it does nothing.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.