Regex required number(s) and optional letters

Asked

Viewed 1,681 times

2

I have this regex to validate, but it is not working properly, it needs to be mandatory number, can be more than one, and optional letters. (?:\\d+[a-z]|[a-z]+\\d)[a-z\\d]*

This is required 1 letter and 1 number.

  • Try it this way: [a-za-Z0-9_.-]*$

  • @Viniciusmatos tried this way and he did not do any treatment.

  • Try this (?:.* d.{1,2})$ so what I tested needs to have at least two numbers.

  • may occur to have a single number.

  • Try this Expression: (?:[a-zA-Z]*\d+[a-zA-Z]*)+ for a full match. Example: a1b2c3d4e5

2 answers

7


This regex validates a set of alphanumerics but requires there to be at least one number:

(?:([a-zA-Z]+|)\\d+([a-zA-Z\\d]+|))
  • 1

    You gave it right. Thank you.

5

Just to give another option, this regex requires at least one number (and the rest of the characters can be letters or numbers):

^(?=.*\\d)[a-zA-Z\\d]+$

But it doesn’t work exactly the same as william’s response. Let’s understand the differences.


Differences between this and the another answer

Rather, a small detail: the shortcut to "digits from zero to 9" is \d (only one backslash). What happens is that in several languages a regex is created as a string, and within a string, many languages use the character \ for exhaust sequences (such as \n for line breaking, \t for tab, etc). Therefore, the character itself \ is usually represented by \\. The language you are using is not specified, but this should be the case. Anyway, I will use the form \\ just to be consistent with the question (but depending on the language/engine, it should be used \d).

The main difference between the regex I suggested and the regex (?:([a-zA-Z]+|)\\d+([a-zA-Z]+|)) is that the latter will not always catch all the numbers. For example, if the string is a1b2, the regex will only take the stretch a1b. See here that the match contains only a1b and the 2 is left out (actually, anything that comes after the 2 will not be captured by regex).

I’m not saying this regex is wrong, not least because the answer was accepted. It is only for future visitors who see the title "mandatory numbers and optional letters" be aware that regex does not take all letters and numbers from the string (if you are going to use this regex for substitution, for example, some characters will be outside the match and shall not be replaced).


Another point is if the string only may have letters and numbers, or may have other characters (provided you have a number). It implies that only letters and numbers are accepted, so it is interesting to put the markers ^ (beginning) and $ (end), to delimit that from start to end of string can only have what is in regex.

In addition, the (?: at the beginning means that what is inside the parentheses will not become a catch group. Unless you are using the internal parentheses (([a-zA-Z]+|)) to capture the content of match (using the references \1 or $1) and does not want him to "mess up" the numbering of the groups, it would not be necessary. In fact, this parenthesis is around the whole expression, so it doesn’t seem to be necessary at all.

To end, ([a-zA-Z]+|) means "one or more occurrences of a letter or nothing" (the | means "or" and after it has nothing and already closes the parentheses). This can be exchanged for [a-zA-Z]* ("zero or more occurrences of a letter", without parentheses). Only place this in parentheses if you want to capture the match group afterward.

In short, I could simplify for [a-zA-Z]*\\d+[a-zA-Z]*. Or in parentheses, if you want to capture the groups: ([a-zA-Z]*)\\d+([a-zA-Z]*). Or with the start and end markers, to ensure that the entire string has only letters and numbers: ^([a-zA-Z]*)\\d+([a-zA-Z]*)$.

Just remembering that there is still the detail that this regex leaves the 2 outside in a1b2 (Obs: if you use with ^ and $, won’t even give match in a1b2).


Returning to ^(?=.*\\d)[a-zA-Z\\d]+$

This regex uses the start markers (^) and end ($) to ensure that the string has only letters and numbers.

Right after the start (^) there is a Lookahead: (?=.*\\d). This checks if there are any digits at any position of the string after the start: .* means "zero or more characters" and then comes the \\d, that is, the digit can be in any position of the string. The detail is that the Lookahead just "take a look" without "leaving the place". That is, it sees if it has any digit in the front and then goes back to where it was (in the case at the beginning of the string) to check the rest of the expression.

And in this case, the rest of the expression is [a-zA-Z\\d]+: one or more occurrences of letters or numbers. Next we have the end of the string ($).

That is, the Lookahead ensures that there is at least one digit in the string. And the rest of the expression ensures that we will only have letters or numbers. With this, this regex leaves no character out. See here it working and note that for the string a1b2 the 2 is not left out of the match.


Again, I’m not saying that the another answer is wrong. I’m just giving another alternative (because they don’t work the same way and there are cases where this can make a difference), and each one evaluates what is best for each use case.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.