How to "clean" a csv file with Python?

Asked

Viewed 849 times

1

Hello! I’m extremely new to programming, so I apologize if I can’t explain what I’m trying to do or if my code is really wrong. I have a recurring task in my job which is to open a list of restaurants in Ifood, collect the prices of each of them and calculate the average. It’s simple, but it takes considerable time. I decided to create a program in Python that collects the prices for me, with this I would only have to select everything in Excel and have the average calculated. Here is the program:

import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup

print('Olá! Entre com o link do restaurante do iFood agora.')
site = input()

html = urlopen(site)
bs = BeautifulSoup(html, 'html.parser')
precosLista = bs.findAll('div',{'class':'result-actions'})
csvFile = open('Preços.csv', 'wt+')
writer = csv.writer(csvFile)

try:
    for precos in precosLista:
        print(precos.get_text())
        csvPreco = []
        csvPreco.append(precos.get_text())
        writer.writerow(csvPreco)

finally:
    csvFile.close()

The code works, but the returned data has a strange format. An example:

"


                                                        R$ 71,90


                                                        R$ 59,90














"

"


                                                        R$ 45,90


                                                        R$ 32,90














"

"


                                                        R$ 29,90


                                                        R$ 24,90














"

"


                                                        R$ 29,90


                                                        R$ 24,90














"

"


                                                        R$ 29,90


                                                        R$ 24,90














"

"


                                                        R$ 29,90


                                                        R$ 24,90














"

If I try to open this directly in Excel, comes out extremely messed up and difficult to manipulate. What I have done then is to open the generated csv in Word and remove all spaces, blank lines, R$ and quotes. Once done, I copy the numbers pro Excel and then I have calculated the average. My idea then was to create another program to do this cleaning for me, using the replace function, the problem is that I can’t make it work. Could someone help me?

  • Adds .strip() dps from.get_text prices()

  • Jordan, first your code is pretty messed up, take a search right here stackoverflow with the title "Como ler um csv python", follows a basic example, the correct way of doing thishow to read a python csv

  • in your case, and in cases of . xls or . csv, recommend the used package the "pandas".

  • Laerte, thanks, already improved by 1000% the result generated! Thanks! The only problem I have now is that, from time to time, when the same product has more than one "version" (I don’t know, temaki with cream cheese or without), in csv comes along with the price a "From". I wonder if we can eliminate him too?

  • Yes, just you give a replace, I’ll elaborate an answer then you do the test.

  • Show! I’m waiting to test.

Show 1 more comment

1 answer

1


You can create a function that cleans the text before saving the CSV. In the function I put it to replace ', "and R$, but you can add more rules as you need.

def clean_up_text(value):

    value = value.replace("R$", "")
    value = value.replace("'", "")
    value = value.replace('"', '')
    value = value.strip() # remove espaços em branco e quebras de linha

    return value

Just call this function when saving the value: clean_up_text(precos.get_text())

  • I added this to the code, but I think it’s blank because csv is returning in the same way as before. I put def before that: Try: for precos in precosLista: clean_up_text(precos.get_text()) print(precos.get_text()) csvPreco = [] csvPreco.append(precos.get_text(). strip()) Writer.writerow(csvPreco)

  • Now I understood where to call the function. It worked VERY well here. Thank you very much!!

  • Dispose friend! :)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.