Why are the commas of the numbers being deleted when importing data with Pandas?

Asked

Viewed 116 times

0

I’m racking my brain to understand why this is happening when I take numerical data from a table on the web. In this table contain the values of the quotations of the coins, the problem occurs that, when taking the numerical values, all the commas that separated the values in cents disappear, this is the code that I am making

option = Options()
option.headless = True
driver = webdriver.Chrome(options=option)

driver.get(url)
time.sleep(10)

elemento = driver.find_element_by_xpath('//div[@class="wrapper"]//section//table[@id="exchange_rates_1"]')
html_content = elemento.get_attribute('outerHTML')

#2 - Parsear o conteúdo HTML - BeautiFulSouap
souap = BeautifulSoup(html_content, 'html.parser')
table = souap.find(name='table')
print(table)

#3 - Estruturar conteúdos em um Data Frame - Pandas
df_full = pd.read_html(str(table))[0]
df = df_full[['Código', 'BRL']]
df.columns = ['Moeda', 'Perante_Real']

print(df_full)
print(df)



driver.quit()

I don’t understand why, but I believe you’re on that line

df_full = pd.read_html(str(table))[0]

Because before that, in the middle of the mess of imported HTML code was all right with numerical values. If anyone can help I’d appreciate!

2 answers

1

When Voce uses "read_html" Voce can set attributes that define numeric values as demonstrated in [https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_html.html?highlight=read_html#pandas.read_html] one of these input is the "decimal" parameter which by default is set to a "." , depending on the layout of the site (html), this can be a comma, try the code with the following change of this line

df_full = pd.read_html(str(table),decimal=',')[0]

if unsuccessful, please post a print(df_full.head(5)) or print(df.head(5)) so I can see how the data formatting is in df.

  • I made the change but it hasn’t changed at all, that’s the result of print Moeda Perante_Real
0 BRL 1
1 USD 53411
2 EUR 59976
3 GBP 66558
4 JPY 4963 Coins are not separated in commas

0

I managed to sort it out like this df_full = pd.read_html(str(table).replace(',', '.'))[0] Probably wasn’t recognizing the comma in the middle of the same numbers!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.