Incorrect python value conversion

Asked

Viewed 303 times

1

I have a Rawler who takes the amount STRING R$ 560,000,00.

I need to convert this value to FLOAT because I will use this value to carry out consultations, of this type:

Selecionar todos os carros com o valor entre 100000 até 560000

I’m converting the value this way:

float(price[2:].replace(',', ''))

And he’s converting the value of R$ 560,000.00 to 560.0

I would like the converted values like this:

  • R$ 17,000,00 to 17000
  • R$ 100,000,00 to 100000
  • R$ 560,000,00 for 560000
  • It would be good to know about this: http://answall.com/q/44715/101

  • But do you want it to be for float? Or int? The examples you present are converted to int

2 answers

2


Wouldn’t it be good to give replace at the point too? 100.000 will be equal to 100 real and not to 100 thousand real.

float(price[2:].replace('.', ''))

2

To Reply by @Priscilla is sufficient and in fact the best choice for the vast majority of cases. However, if your crowler need to handle money in different formats, it may be useful for you to consider the location/language of the accessed page. One way to do this is by using the package locale.

Here is an example of illustrative code:

import re
import locale

#--------------------------------------------------
def extractMonetaryValue(text):

    cs = locale.localeconv()['currency_symbol']
    expr = '{}[ ]*[0-9.,]+'.format(cs.replace('$', '\\$'))

    m = re.search(expr, text)
    if m:
        s = m.group(0).replace(cs, '').replace(' ', '')
        return locale.atof(s)
    else:
        return 0.0
#--------------------------------------------------

s = 'Este teste testa um valor (por exemplo: R$ 560.200,40) expresso em Reais.'
locale.setlocale(locale.LC_ALL, 'ptb_bra') # 'pt_BR' se não estiver no Windows
n = extractMonetaryValue(s)
print('Para "{}" o valor é: {}'.format(s, n))

s = 'This test tests a value (let us say U$ 482,128.33) given in US Dolars.'
locale.setlocale(locale.LC_ALL, 'enu_usa') # 'en_US' se não estiver no Windows
n = extractMonetaryValue(s)
print('Para "{}" o valor é: {}'.format(s, n))

In this code, the main function is extractMonetaryValue. She gets some text and searches it for a subtext that contains, necessarily, the currency symbol of the country/language set up (followed by zero or more spaces), and then a number composed of digits, dots and commas. To do so, she uses a regular expression well-rounded: she does not care whether the numerical "format" is correct or not, as this will be done later, by calling locale.atof (exception ValueError if the format is incorrect according to the country/language set).

The output of the above code is as follows::

Para "Este teste testa um valor (por exemplo: R$ 560.200,40) expresso em Reais." o valor é: 560200.4
Para "This test tests a value (let us say U$ 482,128.33) given in US Dolars." o valor é: 482128.33

Notice how the numbers printed at the end use both the dot as decimal separator (after all, they are values represented as float internally, in the same way regardless of the origin treated).

P.S.:

  1. To detect the locale operating system standard, use locale.getdefaultlocale()
  2. To detect the locale from a web page, make sure she has this infomation on the tag lang. If she doesn’t, you’ll need to try to infer the language. For your (Wow! Hehe) Lucky, there’s this Google language detector port to Python called langdetect.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.