I want to scrap a page, but I can’t get a text that has "&nbsp" on it

Asked

Viewed 222 times

-1

Well, the title says it all, I want to get the price of a product from the site https://www.pontofrio.com.br/, like this:

<span class="nm-price-value" itemprop="price">R$&nbsp;367,60</span>

Using that code:

import bs4
import requests

def get_page(url):
    r = requests.get(url)

    try:
        r.raise_for_status()
        page = bs4.BeautifulSoup(r.text, 'lxml')
        return page

    except Exception as e:
        return False

def pontofrio(search):
    soup = get_page(f'https://search3.pontofrio.com.br/busca?q={search}')

    price = [i.getText() for i in soup.select('.nm-price-value')]
    product = [i.getText().upper() for i in soup.select('.nm-product-name a')]
    link = [i.get('href') for i in soup.select('.nm-product-name a')]

    return product, price, link

Where the product and link lists return normally, but the price list is returning:

['\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n']

Please help!

1 answer

2


Your problem is that the price does not actually exist on the page! The cold spot pages, as well as many others on the internet, are made available without the price, and only after this price is placed on the page through code done in javascript that runs in your browser after loading.

Therefore, when inspecting the page code with your browser, javascript will have already run and completed the elements dynamically, so you will find the prices there. As Beautifulsoup does not run javascript, on the page he parsed the prices do not exist.

This is very common on web pages nowadays, which are quite dynamic. Leaving you with two options:

  1. Analyze the javascript code of the page and find out where it gets the price. In this case it seems to come from the url https://preco.api-pontofrio.com.br/V1/Produtos/PrecoVenda/. You can read and follow the javascript code until you find a way to imitate what it does to recover these prices, what parameters to pass, etc. It is not an easy task but the code would be very optimized because it would be python code without having to open a real browser, which would be the second option:

  2. Use a real browser that runs javascript. The library Selenium allows you to open and control a real browser window through your script. Since the page will open in a browser, javascript will work and you can get these prices. The downside is that opening a browser is heavy and slow, in addition to loading various elements and images unnecessary to the process, therefore it would not be as efficient as directly accessing the source.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.