Loop Selenium Phyton

Asked

Viewed 237 times

0

Hello! I started studying Python and I’m trying to make web scraping on the OLX site. I can search and filter. But how can I make a loop for him to click on all the ads so I can pick up the phones?

my script so far:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time


class Olx:
 def __init__(self, imovel):
    self.imovel = imovel
    self.driver = webdriver.Firefox(executable_path=r'C:\Users\Fabio\Desktop\robo\geckodriver.exe')

 def procura(self):
    driver = self.driver
    driver.get('https://www.olx.com.br')
    time.sleep(2)
    procura_element = driver.find_element_by_xpath("//input[@name='q']")
    procura_element.clear()
    procura_element.send_keys(self.imovel)
    procura_element.send_keys(Keys.RETURN)
    time.sleep(2)
    self.clicaregiao()



 def clicaregiao(self):
    drive = self.driver
    drive.get('https://sp.olx.com.br/?q=imovel')
    drive.find_element_by_xpath ('/html/body/div[1]/div/div[1]/div[5]/div/div[2]/div[1]/div[2]/div/ul[1]/li[2]/a').click()
    time.sleep(3)
    drive.find_element_by_class_name('g5f41w-3 bGwyNR').click()

1 answer

0

To make the OLX Webscraping recommend you use the library "Beautifulsoap" and "Json"

In the source code of every OLX page comes a script tag with all the information inside a JSON.

<script id="initial-data" type="text/plain" data-json="{...}">

You must extract the data from the "data-json" attribute and parse the data.

Follow a small example of Webscraping to do this:

#!/usr/bin/env python3
# Exemplo de WebScraping para OLX em Python3
# Rodrigo Eggea
from bs4 import BeautifulSoup
import requests
import json  

def json_from_url(url):
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
    page = requests.get(url, headers=headers)
    soup = BeautifulSoup(page.text, 'html.parser')
    data_json = soup.find(id='initial-data').get('data-json')
    return json.loads(data_json)

# Função que recebe url do anúncio
# e mostra nome do vendedor, telefone,
# descrição do produto e preço
def mostra_dados_do_anuncio(url):
    data = json_from_url(url)
    descricao = data['ad']['body']
    phone =  data['ad']['phone']['phone']
    user = data['ad']['user']['name']
    preco = data['ad']['price']
    print('Vendedor=',user)
    print('Telefone=',phone)
    print('Descrição=',descricao)
    print('preco=',preco)

# Pega a lista de produtos da área de eletrônicos
url_eletronicos='https://pr.olx.com.br/eletronicos-e-celulares'
data = json_from_url(url_eletronicos)

# Entra em cada anúncio e mostra o telefone
adList = data['listingProps']['adList']
for anuncio in adList:
    subject = anuncio.get('subject')
    if subject: 
        print('------------------------')
        descricao = anuncio.get('subject')        
        url = anuncio.get('url')
        print('Descricao do produto:',descricao)
        print('URL do produto=',url)
        mostra_dados_do_anuncio(url)

Note: When you run this code, you’ll notice that most ads don’t have a phone, most advertisers don’t put the phone in, or put it in the description field.

I hope I’ve helped.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.