Regex python to find all values with Math in Brazilian currency

Asked

Viewed 46 times

0

I need a regex to find all the monetary values of Brazil, I have done several researches and test but I could not reach a satisfactory result.

1.566.545,00 = True
154.565,00 = True
22.555,00 = True
1.550,00 = True
11,10 = True
1100 = false


regex utilizada: '\d{0,}\W{0,}\d{1,}\d{0,}\W{0,}\d{0,}\,{0,1}\d{0,}'
  • 1

    "I’ve done several researches and testing ..." puts in question what has tried and what is wrong, so it is easier for someone to suggest a correction

  • Includes the regex used

  • humm this a lot of work (the way it is there), could do something like this "see if there is a group of 3 digits to follow at each point", which would look like this (.\d{3}) that is, if there is a point need to precer a group of 3 digits, so do it once only

  • I was unable to arrive at a satisfactory result in that case which is a satisfactory result?

  • I get it, I’m gonna run some tests to see

  • @Augustovasques may have expressed me badly, a satisfactory result would be to give True in the values informed

Show 1 more comment

1 answer

2


\d{0,} means "zero or more digits," so if you don’t have any digits, you’ll also find a match.

Anyway, a monetary value the way you want it follows the following rules:

  • may have up to 3 initial digits
  • then you can have a point followed by another 3 digits, and this can be repeated several times (or no)
  • always ends with a comma and two digits

A first attempt would be:

import re

regex = re.compile(r'^\d{1,3}(\.\d{3})*,\d{2}$')

valores = ['1.566.545,00', '154.565,00', '22.555,00', '1.550,00', '11,10', '1100']
for v in valores:
    print(f'{v} = {bool(regex.match(v))}')

I used the markers ^ and $, which indicate the beginning and end of the string, so I guarantee that it only has what is in regex.

  • \d{1,3} means "not less than 1 and not more than 3 digits"
  • Then we have a dot, followed by exactly 3 digits (the {3} indicates that they are exactly 3). All of this is in parentheses and with the quantifier *, meaning "zero or more times" (it is the same as {0,}). That is, the point followed by 3 digits can be repeated several times (or none)
  • Then we have the comma and two more digits

The exit code above is:

1.566.545,00 = True
154.565,00 = True
22.555,00 = True
1.550,00 = True
11,10 = True
1100 = False

But this still does not treat some cases, for example 012,12 - should start with zero? The only cases that might start with zero would be 0,15 or 0,00 for example. If this is the case, then it starts to complicate. It would have to be something like:

regex = re.compile(r'^((?!0)\d{1,3}(\.\d{3})*|0),\d{2}$')

Now I’ve used alternation (the character |, which means "or"). And she has two alternatives:

  • (?!0)\d{1,3}(\.\d{3})*: the stretch (?!0) is a Lookahead negative, which checks that there is no zero in front of it. The rest is what we have seen above (the digits and point sequences + 3 digits). That is, this sequence cannot start with zero.
  • 0: is a single digit zero

That means either you have a zero and then the comma, or you have multiple digits, as long as the first one isn’t zero.

After that we have the comma and the two digits. With this, regex accepts values such as 0,00 and 0,15, but rejects 00,00 and 012,12.

  • 1

    regex = re.compile(r'^((?!0)\d{1,3}(\.\d{3})*|0),\d{2}$') this one worked exactly as I needed it. Thanks so much for the help! and for the explanation

Browser other questions tagged

You are not signed in. Login or sign up in order to post.