Regex for monetary values

Asked

Viewed 4,377 times

2

I would like to know how to do a regex to capture a monetary value with semicolons. Ex: 7.300.250,00

  • What language are you using?

  • I’m using python

  • You just want to validate the format or want to get the numeric value as well?

  • wish to get value too

2 answers

6


To validate if the string is in this format, you can use ^[1-9]\d{0,2}(\.\d{3})*,\d{2}$:

  • ^ and $ sane markers for the beginning and end of the string. So you ensure that the string will only have what is specified in regex
  • [1-9] is a character class. The brackets indicate that you want anything that is inside them. In case, 1-9 is "any digit from 1 to 9"
  • \d is a shortcut for [0-9] (digits from 0 to 9) and {0,2} is a quantifier meaning "between zero and two occurrences"
    • therefore [1-9]\d{0,2} means that I have a digit from 1 to 9, followed by zero, one or two digits from 0 to 9. This ensures that the string does not start with zero

Then we have (\.\d{3})*:

  • \. means the dot character (.). The dot has special meaning in regex (meaning "any character"), but with the \ before, it "loses its powers" and becomes a common character.
  • \d{3} are 3 occurrences of any digit from 0 to 9
    • the sequence "dot followed by 3 digits" is in parentheses and then we have the *, which means "zero or more occurrences". This means that we can have several occurrences (or none) of "dot followed by 3 digits" (this is to check the sequence .300.250 of your entry). The * also checks for zero occurrences, which is useful for values less than 1000.
  • finally, we have the comma followed by 2 digits (,\d{2})

This ensures that the input will be in the desired format. See here the regex working.


To get the numeric value, you can simply remove everything that is not digit and convert to int. For this, we use the regex \D (which is the opposite of \d, that is, it is everything that is not digits from 0 to 9).

With this you will have the total amount of pennies. Below I transform the value to int, since it is better use whole types to work with monetary values. If you want the value without the pennies, just divide it by 100, and if you want the pennies, use the operator %:

import re

s = "7.300.250,00"
# se está no formato desejado
if re.match(r"^[1-9]\d{0,2}(\.\d{3})*,\d{2}$", s):
    # retira tudo que não for dígito e converte para int
    valor = int(re.sub(r"\D", "", s))
    print("Valor (quantidade total de centavos): {}".format(valor))
    print("Valor sem os centavos: {}".format(valor // 100))
    print("Valor dos centavos: {}".format(valor % 100))

The exit is:

Value (total amount of cents): 730025000
Value without the cents: 7300250
Value of cents: 0


Just a detail about the \d: it may also correspond to other characters representing digits, such as the characters ٠١٢٣٤٥٦٧٨٩ (see this answer for more details).

Example:

s = "1٩,10"
if re.match(r"^[1-9]\d{0,2}(\.\d{3})*,\d{2}$", s):
    valor = int(re.sub(r"\D", "", s))
    print("Valor (quantidade total de centavos): {}".format(valor))

I used the character ٩ (Arabic-indic Digit Nine), that although it looks like the digit 9, is another character. The output is:

Value (total amount of cents): 1910

That’s because the \d also takes this character. If you want only the digits of 0 to 9 be considered, exchange \d for [0-9]:

if re.match(r"^[1-9][0-9]{0,2}(\.[0-9]{3})*,[0-9]{2}$", s):
    ... o resto é igual

Another detail is that regex only works for values greater than 1.00. If you also want to consider values like 0.15 (15 cents), you have to include a condition to consider only one zero before the comma:

if re.match(r"^(0|[1-9]\d{0,2}(\.\d{3})*),\d{2}$", s):
    ... o resto é igual

Now I use alternation (the character |, meaning "or") to indicate that before the comma there may be only one zero, or all the expression we have seen before).

  • 1

    Thank you!! Very thorough and enlightening

  • I tested this expression in regex101 and did not match with values below 1 real (Ex: 0.50)

  • @Andréroggericampos Yes, I forgot about this case. I updated the answer, thank you!

  • Caracas, what a quick response haha... It worked here ! Fight !

3

You can use that expression:

 ^(([1-9]\d{0,2}(\.\d{3})*)|(([1-9]\.\d*)?\d))(\,\d\d)?

Browser other questions tagged

You are not signed in. Login or sign up in order to post.