Data Crawling in Python

Question

Data Crawling in Python

Asked 6 years, 5 months ago

Viewed 112 times

0

Good afternoon, you guys.

I decided to start my studies with the Python Crawler technique. I built the following script using lib Selenium :

   # Importando selenium para realizar o crawling

from selenium import webdriver
import time
import csv
# especificando onde o arquivo do webdriver está

chrome_path = r" Desktop\Crawler\chromedriver.exe"

# criando uma variável com a localização do webdriver
driver = webdriver.Chrome(chrome_path)

# utilizando o comando driver para redirecionar para um site
driver.get("Link")
time.sleep(5)

# procurando um determinado elemento na página


# Pesquisando pelo estado
estado = driver.find_element_by_css_selector(
    "body > div.layout > main > div > div.col-md-12.ng-scope > div > form:nth-child(4) > div:nth-child(2) > div:nth-child(1) > div > select > option:nth-child(18)").click()
time.sleep(7)

# Após obter o estado, pesquisar município
municipio = driver.find_element_by_css_selector(
    "body > div.layout > main > div > div.col-md-12.ng-scope > div > form.form-inline.ng-valid.ng-dirty.ng-valid-parse > div:nth-child(2) > div:nth-child(2) > div > select > option:nth-child(113)").click()
time.sleep(5)

# Ação de click no botão de pesquisa
pesquisar = driver.find_element_by_css_selector('body > div.layout > main > div > div.col-md-12.ng-scope > div > form.form-inline.ng-pristine.ng-valid > div > button'
                                                ).click()
time.sleep(5)

# Extraindo dados em variáveis
siglaEstado = driver.find_element_by_xpath(
    "/html/body/div[2]/main/div/div[2]/div/div[3]/table/tbody/tr[1]/td[1]")

nmMunicipio = driver.find_element_by_css_selector(
    "body > div.layout > main > div > div.col-md-12.ng-scope > div > div:nth-child(9) > table > tbody > tr:nth-child(1) > td:nth-child(2)")

cnes = driver.find_element_by_xpath(
    "/html/body/div[2]/main/div/div[2]/div/div[3]/table/tbody/tr[1]/td[3]")

nmFantasia = driver.find_element_by_xpath(
    "/html/body/div[2]/main/div/div[2]/div/div[3]/table/tbody/tr[1]/td[4]")

natureza = driver.find_element_by_xpath(
    "/html/body/div[2]/main/div/div[2]/div/div[3]/table/tbody/tr[1]/td[5]")

gestao = driver.find_element_by_xpath(
    "/html/body/div[2]/main/div/div[2]/div/div[3]/table/tbody/tr[1]/td[6]")

sus = driver.find_element_by_xpath(
    "/html/body/div[2]/main/div/div[2]/div/div[3]/table/tbody/tr[1]/td[7]")

I would like to know how to export the data of these variables to a file . CSV that should be generated in Python itself. And tips on how to improve my code.

2

You can just give a csv print, I did something similar to Selenium https://github.com/bulfaitelo/Tesouro-Direto-Scraper see if it can help you.

– Bulfaitelo

2019/02/16 at 19:04
You can create this csv file and when you finish making the page Scrap, you open the file and save the variables with the data there. There are no limits to manipulating a python csv file.

– Rafael Rotiroti

2019/02/17 at 00:04

1 answer

Browser other questions tagged python csv selenium web-scraping web-crawler

You are not signed in. Login or sign up in order to post.

by Davi Luis • **123** points · Answer 1 · 2019-02-17T14:42:14+00:00

You can use the standard python library for csv, below follows an example of how to take value from a lists and convert to csv:

import csv

with open(meuArquivo, 'wb', newline='') as arquivo:
 teste = csv.writer(arquivo, quoting=csv.QUOTE_ALL)
 teste.writerow(lista)

Regarding Lenium, I find it very cool but I recommend you study about the requests.