-1
I’m making a Web Scrap using Python and Pandas, on Windows. I am collecting the data from the page, generating a Dataframe in Pandas and then exporting to an Excel spreadsheet. I’m not using any databases in this case. I’m in two trouble:
I need to collect the name and price of the product, but on the site page, some products do not have the price.... then the dataframe takes the price of the next product and plays on the product that has the null price, generating wrong information. How can I make this right?
In the same case, the price on the site has the price separate from the pennies, each in a different class from the HTML code. I can recover both information, but how do I concatenate? It is very bad one column with the price and the other with the cents.....
Follows part of the code I’m using:
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.s_2'
page = requests.get(url)
soup = BeautifulSoup(page.text,'html.parser')
review_text = []
review_text_elem = soup.find_all(class_='a-link-normal a-text-normal')
for item in review_text_elem:
review_text.append(item.text)
user_name = []
user_name_elem = soup.find_all(class_='a-price-whole')
for item in user_name_elem:
user_name.append(item.text)
review_price = []
review_price_elem = soup.find_all(class_='a-price-fraction')
for item in review_price_elem:
review_price.append(item.text)
print(review_price)
final_array = []
for text, user, cents in zip(review_text, user_name,review_price):
final_array.append({'Produto': text.replace("\n", ""), 'Preço': user, 'Centavos': cents})
col = 'Produto Preço Centavos'.split()
df = pd.DataFrame(final_array, columns= col)
print(df)
df.to_excel('amazonpanda4.xlsx',index=False)
You can format the code by placing 3 crases before and 3 after. Formatting in Python is essential.
– Paulo Marques
Hello Paul thank you for your help!! &#X;Could you give me an example of what this solution would look like so I can better understand it? Thank you!!!
– Daniela Martinez
Daniela, good night! The question has already been solved? If not solved could put from which page you need to extract the data?
– lmonferrari