How to use Beautifulsoup’s "find" to find a script tag with a specific type?

Asked

Viewed 90 times

5

For a while I have been studying how to use Beautifulsoup to be able to find tag content etc.

But I came across a problem where the content I want to find is inside a tag <script type="text/javascript"> and only using find("script"), it finds only the tags <script>, and, if I try to put find("script type="text/javascript"), the code gives error.

def get_cod_produto(url):
    response = requests.get(url)
    data = response.text
    soup = bs(data, 'html.parser')
    body = soup.body
    localizaScript = body.find('script type="text/javascript"')
    texto = localizaScript.string
    array = re.split('"', texto) 
    print(array)

get_cod_produto("https://www.kabum.com.br/cgi-local/site/listagem/listagem.cgi?string=rtx+2060&btnG=&pagina=2&ordem=3&limite=30&prime=false&marcas=[]&tipo_produto=[]&filtro=[]")

It returns this error when I post any information other than just script:

AttributeError                            Traceback (most recent call last)
<ipython-input-565-dfa6bc29cef9> in <module>
----> 1 get_cod_produto("https://www.kabum.com.br/cgi-local/site/listagem/listagem.cgi?string=rtx+2060&btnG=&pagina=2&ordem=3&limite=30&prime=false&marcas=[]&tipo_produto=[]&filtro=[]")

<ipython-input-564-5e17a520cf0d> in get_cod_produto(url)
      5     body = soup.body
      6     localizaScript = body.find('script type="text/javascript"')
----> 7     texto = localizaScript.string
      8     array = re.split('"', texto)
      9     print(array)

AttributeError: 'NoneType' object has no attribute 'string'

How can I pull the information from this tag?

  • I never used Beautifulsoup, but if the method find is based on valid CSS selectors, I can explain the problem. Basically, script type="text/javascript" nay is a valid CSS selector. If you want to limit the search of a tag to a certain one attribute (as the attribute type), should involve the name of this square brackets. Thus: script[type="text/javascript"].

1 answer

5


import requests
from bs4 import BeautifulSoup as bs

def get_cod_produto(url):
    response = requests.get(url)
    data = response.text
    soup = bs(data, 'html.parser')
    return soup.find('script', type="text/javascript")

get_cod_produto("https://www.kabum.com.br/cgi-local/site/listagem/listagem.cgi?string=rtx+2060&btnG=&pagina=2&ordem=3&limite=30&prime=false&marcas=[]&tipo_produto=[]&filtro=[]")

Just do: soup.find('script', type="text/javascript").

As the documentation shows us: Method signature: find(name, attrs, recursive, string, **kwargs)

  • 1

    It worked, thank you very much!

  • Good night Luke! For nothing! Hug!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.