"Simply simplify" using (\s+)?
into the spaces be optional, regex does not have to be very simple, but in your case you can simplify a little, like this:
(\d+(,\d+)?)(\s+)?(cm)?(\s+)?x(\s+)?(\d+(,\d+)?)(\s+)?(cm)?(\s+)?x(\s+)?(\d+(,\d+)?)(\s+)?(cm)?
Online example in Reger: https://regexr.com/3rpmr
Explaining the regex
The first part of the regex would be this:
(\d+(,\d+)?)(\s+)?(cm)?
The (,\d+)?
optionally search the number post comma
The (\s+)?
search one or more spaces opitionally
The (cm)?
seeks the measure opitionally
Okay, after that just use one x
between repeating the expression, of course you can do it in other ways, but the result would be almost the same, so it’s repetitive but more comprehensive
If the goal is to search one entry at a time then apply the \b
at the beginning and end should already solve also, for example:
\b(\d+(,\d+)?)(\s+)?(cm)?(\s+)?x(\s+)?(\d+(,\d+)?)(\s+)?(cm)?(\s+)?x(\s+)?(\d+(,\d+)?)(\s+)?(cm)?\b
Multiple values
Now if the input has multiple values so do it this way:
import re
expressao = r'(\d+(,\d+)?)(\s+)?(cm)?(\s+)?x(\s+)?(\d+(,\d+)?)(\s+)?(cm)?(\s+)?x(\s+)?(\d+(,\d+)?)(\s+)?(cm)?'
entrada = '''
23,6 x 34 x 17,1
14,5 x 55 x 22
14,5cm x 55 x 22cm
14,5cmx55x22cm
14,5 cmx55 cmx22 cm
''';
resultados = re.finditer(expressao, entrada)
for resultado in resultados:
valores = resultado.groups()
print("Primeiro:", valores[0])
print("Segundo:", valores[6])
print("Terceiro:", valores[12])
print("\n")
Note that the group at regex is 6 in 6 to catch each number between the X
, that is, each group returns something like:
('23,6', ',6', ' ', None, None, ' ', '34', None, ' ', None, None, ' ', '17,1', ',1', '\n', None)
('14,5', ',5', ' ', None, None, ' ', '55', None, ' ', None, None, ' ', '22', None, '\n', None)
('14,5', ',5', None, 'cm', ' ', ' ', '55', None, ' ', None, None, ' ', '22', None, None, 'cm')
('14,5', ',5', None, 'cm', None, None, '55', None, None, None, None, None, '22', None, None, 'cm')
('14,5', ',5', ' ', 'cm', None, None, '55', None, ' ', 'cm', None, None, '22', None, ' ', 'cm')
So that’s why you’ll only use the valores[0]
, valores[6]
and valores[12]
, example in repl.it: https://repl.it/@inphinit/regex-python-Extract
Using values for mathematical operations
Note that ,
does not make the number to be considered a "number" for Python, so if you are going to do a mathematical operation convert to float
, thus:
float('1000,00001'.replace(',', ','))
It must be something like that:
for resultado in resultados:
valores = resultado.groups()
primeiro = float(valores[0].replace(',', '.'))
segundo = float(valores[6].replace(',', '.'))
terceiro = float(valores[12].replace(',', '.'))
print("Primeiro:", primeiro)
print("Segundo:", segundo)
print("Terceiro:", terceiro)
print("Resultado:", primeiro * segundo * terceiro)
print("\n")
First: Do you just want to know how does regex work or would you accept another suggestion (type, without regex) for your problem? After all, you didn’t report for what you need it for. Maybe it could be solved with the answer already given, on the other hand, I think it’s unclear if you need to have the units of measure next to the numbers.
– Wallace Maxters
Wallace, I would like to solve just by adjusting the expression to suit the case I mentioned. Even if it has to be a completely new regex. In view of the last mentioned case.
– rodrigorf
The point is not to be a new regex, the question is: "You accept a solution without regex?"
– Wallace Maxters
No. Thank you but I want to settle only with the adjustment in the expression.
– rodrigorf