Regex to validate currency, fractionated, accepting negatives

Asked

Viewed 608 times

1

I will use in Delphi 10.1. I managed to reach the expression below, but for lack of knowledge and practice, I’m not getting to where I want.

Regex:

^(^[\-]?[1-9]|0)(?:[0-9]{0,10}|0)(?:,\d{0,2})?$



I need to analyze whether the number is a valid monetary value, which is only numbers, no space, no thousand separator or currency symbol, has none, one or even two decimal places, and which accepts positives (no +sign) and negatives (preceded by -)accepting zero if it is the only number but not zero at the beginning if it is integer or fractionated, but accepting zero at the beginning if zero is the only one before the comma and the fractionated part is more than one cent.

You must accept:

0
0,//Será formatado após
0,0
0,00
0,01 até 0,99
0,1
1234567890,99
-1234567890,99
-0,01 até -0,99
1,00
-1,00



Will not be able to accept:

-0
-0,
-0,00
-,01 a -,99
//Sequencia vazio
01,00
01
0012
  • I managed to get to the bottom, which allows negative or positive, but does not allow zero at the beginning. I know it does not seem 100% correct, but it seems close. For now only 0,01 to 0,99 remain. already -0,00 have given up for now. So far: ( [-]? [1-9]| [ 0]? [1-9])(?: [0-9]{0,10}|0)(?: d{0,2})?$

2 answers

1

You can simplify the criteria.

If the first digit is zero, there can be no sign before, and after it can have only the comma, or the comma followed by 1 or 2 digits (the cases 0,0, 0,00, 0,01 until 0,99 and 0,1 in the background are variations of this rule). So all these cases are covered by 0(,\d{0,2})?:

  • the stretch \d{0,2} means "from zero to 2 digits" (note that you do not have to differentiate if any of them is zero or not; whatever, since you accept both 0,0 how much 0,00, 0,1 and 0,01 until 0,99). The case of "zero digits" is to accept only the comma.
  • the whole section "comma followed by \d{0,2} is optional, therefore also accepts only 0

When the first digit is not zero, then you can have the minus sign, but you can’t have the comma alone (you have to have one or two digits after it). So this stretch is -?[1-9]\d*(,\d{1,2})?:

  • -? indicates that the minus sign is optional
  • [1-9] is a digit from 1 to 9
  • \d* is "zero or more digits". If you want to limit the amount, you can use something like \d{0,9} (minimum zero, maximum 9 digits). The important thing is that the minimum is zero digits so that the regex accepts numbers with only one digit (which will be the same as the [1-9])
  • (,\d{1,2})? - comma followed by 1 or 2 digits, and this whole stretch is optional (either it has a comma of 1 or 2 decimal places, or it has nothing)

If you start with -0, it is mandatory to have a comma and the decimal places cannot be zero only. So it would be -0,(0[1-9]|[1-9]\d?):

  • -0, - this part is mandatory (minus sign, zero, comma)
  • then we have a alternation (the character |, which means "or"), with 2 options:
    • 0[1-9] - the zero number followed by a digit from 1 to 9
    • [1-9]\d? - a digit from 1 to 9 followed optionally by a digit (so it takes -0,4, -0,40 and -0,42)

Put the three options together, we get:

^(0(,\d{0,2})?|-?[1-9]\d*(,\d{1,2})?|-0,(0[1-9]|[1-9]\d?))$

I put the above 3 options in another toggle, so the regex tries one, if not, tries another, etc. I also put around the expression the markers ^ and $, that indicate the beginning and end of the string. So it ensures that the string can only have what is in regex.

The detail of the toggle is that it checks the options from left to right. That is, first it sees if it starts with zero. If not, see if you start with the optional hyphen and then you have a digit from 1 to 9. Otherwise, see if you have the hyphen and a zero after.

Thus, each case is checked separately, without risk of false positives, and without mixing the rules (as was happening with its regex, which tried to do everything at once).


To another answer forgot to limit to 2 decimal places and use lookaheads negative, that although useful, I did not find it necessary here (besides, they slow down the regex, since their function is to go back and check what has before - see here and here a comparative of the performance, the amount of steps required is about 6 times less, in addition to the expression having become a little less confusing - but it is still not so simple to read, understand and maintain, in my opinion).

0

There are 3 rules you should consider with your problem:

  • Can’t capture if it’s negative and started in ",";
  • You cannot capture if the number starts at "0" and has no "," then;
  • You cannot capture a negative number that does not have at least a number from 1 to 9 in the string.

So to cover all possibilities must be captured only

  • Positive numbers that start at 0 and have decimals right after 0;
  • Negatives that start at 0 but have at least a number between 1 and 9 in the string (indicating that it is not only a negative 0);
  • Numbers that are started between 1 and 9 and may contain decimal part;
  • negative numbers starting in 1 or 9 which may or may not contain decimals;
  • Number 0 that does not have the character "-" before but can contain a character "," after "0"; then;

That’s why I made this regex:

(?<!-)(0,\d*[0-9])|(?<!0)([1-9]\d*,\d*)|(-[0-9],\d*?[1-9]\d*)|(-[1-9]\d*,{0,1}\d*)|\b(?<!-)(0)[a-zA-Z,\n]

Each catch group represents one of these possibilities.
You can see this regex running and its result in this link.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.