I’m making webscraping with python and can’t break a loop

Question

I’m making webscraping with python and can’t break a loop

Asked 4 years ago

Viewed 35 times

1

I’m automating the search for gamer laptops on Amazon, which in addition to picking up the first page it picks up the next ones, but it gets to a point that won’t stop trying to pick up more pages and never exited the loop and I have already tried checking, if there are no more pages to follow, that when true makes the variable continue as false, etc. Follow the code below

from bs4 import BeautifulSoup
import requests

page_index = 1
index = 0
continuar = True
link = 'https://www.amazon.com.br/s?k=acer+nitro+5&rh=n%3A16364755011&s=price-asc-rank&dc&__mk_pt_BR=%C3%85M%C3%85%C5%BD%C3%95%C3%91&qid=1625947439&rnid=18726358011&ref=sr_nr_n_1'

site = requests.get(f'{link}&page={page_index}')

print("Notebooks gamer Acer nitro 5:\n\n")

#a-pagination

while continuar:
    site_bs4 = BeautifulSoup(site.text, 'html.parser')

    items = site_bs4.findAll('div', class_="s-expand-height s-include-content-margin s-latency-cf-section {{ "
                                           "borderCssClass }}")

    div_paginas = site_bs4.find('div', class_='a-text-center')
    paginacao = div_paginas.find_all('li', class_='a-last')
    if (paginacao):
        for pagina in paginacao:
            global paginas
            paginas = pagina.find('a')

        if(index == len(items)):
                if(paginas):
                    site = requests.get(f'https://www.amazon.com.br/{paginas["href"]}')

    for item in items:
        index += 1
        print(index)
        titulo = item.find('span', class_ = 'a-size-base-plus a-color-base a-text-normal')
        url = item.find('a', class_='a-link-normal a-text-normal')
        preco = item.find('span', class_='a-price-whole')

        #preco = items.find('a', class_='a-size-base a-link-normal a-text-normal')

        if(preco):
            print(f'Produto: {titulo.text}')
            print(f'https://www.amazon.com.br/{url["href"]}')
            print(f'Preço: R${preco.text.replace(",", "")}')
        else:
            print(f'Produto não disponível: {titulo.text}...\nDa url: https://www.amazon.com.br/{url["href"]}')
        print("\n")

print('\nLista acabada!')

I hope you can help me :)

2 answers

1

Your error is simpler than it seems: You didn’t put a condition where the variable continuar flipped False, so the code keeps repeating endlessly.

To solve I added the line continuar = False after running all your internal loops.

There were also some mistakes (such as the statement of global paginas within a for which would cause it to be redefined at each loop), little problems of indentation outside the Python pattern, etc.

At the end of the day the following code worked perfectly:

from bs4 import BeautifulSoup
import requests

page_index = 1
index = 0
continuar = True
link = 'https://www.amazon.com.br/s?k=acer+nitro+5&rh=n%3A16364755011&s=price-asc-rank&dc&__mk_pt_BR=%C3%85M%C3%85%C5%BD%C3%95%C3%91&qid=1625947439&rnid=18726358011&ref=sr_nr_n_1'

site = requests.get(f'{link}&page={page_index}')

print("Notebooks gamer Acer nitro 5:\n\n")

#a-pagination

while continuar:
    site_bs4 = BeautifulSoup(site.text, 'html.parser')

    items = site_bs4.findAll('div', class_="s-expand-height s-include-content-margin s-latency-cf-section {{ "
                                           "borderCssClass }}")

    div_paginas = site_bs4.find('div', class_='a-text-center')
    paginacao = div_paginas.find_all('li', class_='a-last')
    if (paginacao):
        global paginas
        for pagina in paginacao:
            paginas = pagina.find('a')

        if(index == len(items)):
                if(paginas):
                    site = requests.get(f'https://www.amazon.com.br/{paginas["href"]}')

    for item in items:
        index += 1
        print(index)
        titulo = item.find('span', class_ = 'a-size-base-plus a-color-base a-text-normal')
        url = item.find('a', class_='a-link-normal a-text-normal')
        preco = item.find('span', class_='a-price-whole')

        #preco = items.find('a', class_='a-size-base a-link-normal a-text-normal')

        if(preco):
            print(f'Produto: {titulo.text}')
            print(f'https://www.amazon.com.br/{url["href"]}')
            print(f'Preço: R${preco.text.replace(",", "")}')
        else:
            print(f'Produto não disponível: {titulo.text}...\nDa url: https://www.amazon.com.br/{url["href"]}')
        print("\n")

    continuar = False

print('\nLista acabada!')

1

is as they say, the stack overflow guys are angels, thank you for talking about the global variable, all the time q programo in python, this is the first time that global variable use, thank you very much

– Farofa de Cachorro

2021/07/12 at 12:00
1

already gave upvote Brow kk

– Farofa de Cachorro

2021/07/13 at 13:32

Browser other questions tagged python beautifulsoup

You are not signed in. Login or sign up in order to post.

by Farofa de Cachorro • 36 points · Answer 1 · 2021-07-13T13:46:31+00:00

I found a solution myself, the code is here for those who want to incorporate https://github.com/farofaDeCachorro/myprojects/blob/main/amazon_automatic.py license: cc0, you can use the will

Edit: anyone who wants to help in the code can also rsrs