How can I remove 'R$ xa0' from a obtained result?

Asked

Viewed 38 times

-2

I am collecting data from a particular site, where when collecting some values I am facing some difficulties and would like to help to solve.

I have the following code:

valores = soup_anuncio.find_all('span', attrs='ek9a7p-0')
categoria_valores = []
for i in valores:
    valor = i.get_text()
    categoria_valor.append(valor)
    print(categoria_valor)

And the value returned (in str) is:

['R$\xa01.417', 'R$\xa01.200', 'R$\xa0185', 'R$\xa018', 'R$\xa028', 'R$\xa02.848']

I tried to take the R$\xa0 of the result in some ways, but I couldn’t.

How can I get only numbers in int format?

  • Do you want monetary values and int? Will lose pennies.

  • The values have no cents. In fact where is the point is a separator of 1000

1 answer

1


If the goal is to remove from a string the Brazilian currency symbol and the thousand separator point, for each of the strings take a slice of the fourth character at the last to remove the real symbol plus the space and what is left remove the mile separator point with str.replace() and then convert her to int for as told us comments values do not have pennies.

categoria_valor = ['R$\xa01.417', 'R$\xa01.200', 'R$\xa0185', 'R$\xa018', 'R$\xa028', 'R$\xa02.848']

vals= [int(v[3:].replace('.','')) for v in categoria_valor]

print(vals)
#[1417, 1200, 185, 18, 28, 2848]

Test the code on Ideone

Applying in your code:

valores = soup_anuncio.find_all('span', attrs='ek9a7p-0')
categoria_valor = [int(v.get_text()[3:].replace('.','')) for v in valores]

Browser other questions tagged

You are not signed in. Login or sign up in order to post.