Most voted "beautifulsoup" questions
Beautiful Soup is a library written in Python for HTML and XML document analysis. Widely used in the practice of Web Scraping, it creates a document analysis tree for collecting or scraping information from a web page.
Learn more…60 questions
Sort by count of
-
5
votes1
answer90
viewsHow to use Beautifulsoup’s "find" to find a script tag with a specific type?
For a while I have been studying how to use Beautifulsoup to be able to find tag content etc. But I came across a problem where the content I want to find is inside a tag <script…
-
3
votes1
answer84
viewsbs4: How to wrap an incomplete html code?
Hello, I came across incomplete html codes where are missing the tags "html" and "body". Follow the code I implemented: import bs4 content=''' <head> <title> my page </title>…
-
3
votes1
answer1193
viewsWeb scraping with Python (Selenium and Request)
Hello, I am trying to perform a web scraping on a page protected by login, I have already managed to access both via Request, and via Selenium, the problem is after login. The page is as follows:…
-
2
votes2
answers1229
viewsCatch tags within tags in Beautifulsoup
I have the following situation: <a href="https://g1.globo.com">Globo</a> <h3 class="b"> <a href="https://www.google.com">Google</a> </h3> Using Beautifulsoup, as…
-
2
votes1
answer77
viewsHow to extract text from a selected Beautifulsoap element?
I’m making a simple Crawler to get some news from the financial market. The code below is working properly, but would like to extract only the headline and then delete the html/CSS codes. import…
-
2
votes2
answers580
viewsHow to collect data in web Crapping in Python?
Within of this URL, has several links , I have to take the links for the month of June 2017, download them and create a dataframe with all the files in one. But I stopped here at this part, how can…
-
1
votes1
answer2667
viewsHow to avoid Max retries exceeded error in scraping in Python?
In Python 3 I made a program to scrape table lines from a public website with several pages (97893). And I create a list with the rows of each column and put a sleep to try to prevent scraping from…
-
1
votes1
answer278
viewsBeautiful Soup - Remove a tag keeping Text
I have the following tags: <p>Projeto N <sup>o</sup> 00.000, DE 00 DE JANEIRO DE 0000.</p> I would like to remove the tag keeping the text. I needed it to stay that way:…
-
1
votes0
answers29
viewsMultiple search factors in an html file with variable parameters
Good morning, everyone. Needed to search by the name of the presidents of brazil, in html files. I created a json with the names of the presidents to facilitate. Follows the code: # !/bin/env python…
-
1
votes1
answer88
viewsError printing HTML with Beautifulsoup
I have a simple code that accesses a quiz site and takes all the ul which contain the class square and prints on screen. url = "http://quizdomilhao.com.br/category/g1" question_page =…
-
1
votes1
answer347
viewsScraping data using Robobrowser
I’m trying to scrape a form, to insert an attachment and submit, using Robobrowser. To open the page I do: browser.open('url') To get the form I make: form = browser.get_form(id='id_form') To enter…
-
1
votes2
answers323
viewsPython-based web Scrapping does not provide complete html page information
Personal greetings, I’m trying to use python to get the information from the page http://www.nfce.se.gov.br/portal/painelMonitor.jsp , is a page from Faz where it has the ping status of NFC-e…
-
1
votes1
answer123
viewsText.strip() Python
I am trying to make a code that extracts some information from a page. The file has the following format: <tr class="impar"> <td class="id"> <a…
-
1
votes1
answer1875
viewsWeb scraping on page with login and password
I am trying to extract source code from an html file with the following style: <div class="both"></div> <div class="st-box" id="source-code"> <h3>SOURCE CODE</h3>…
-
1
votes0
answers42
viewsReading Table in Python with Beautiful Soup
I need to get a table of the transparency portal to then write to the database. I am using Beautiful Soup. I can’t bring in the request the part that has the data and consequently no tag that I look…
-
1
votes2
answers816
viewsIn requests, how to correctly read the ISO-8859-1 encoding?
In Python3, with beautifulsoup4 and requests, I want to extract some information from a site that has encoding 'ISO-8859-1'. I tried this strategy to show correctly the text: import requests from…
python character-encoding python-requests beautifulsoupasked 5 years, 6 months ago Reinaldo Chaves 333 -
1
votes2
answers1450
viewsHow to extract information from table(html) and move to a Dataframe - Using Selenium
I am using Selenium to access the site http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-ajustes-do-pregao-ptBR.asp and manipulate the date box and ok button. So far I managed to do the task…
-
1
votes2
answers294
viewsRegex does not take all span tag strings
I know there are HTML parsers, but since my HTML is not well structured, I also need to use regular expressions. The HTML is like this: <tr bgcolor="#CCCCCC"> <td colspan="2"><font…
-
1
votes2
answers185
viewsBeautifulsoup: Catch text inside table
I’m trying to get specific values within a table, I have a similar code that I already use in the same way in another unique table structure within html, the problem and that I can’t get the text of…
-
1
votes2
answers101
viewsDoubt about value extraction using Soup.findAll() in Python
Good afternoon to all. I am studying python and learning to extract data on websites and to start this learning I am creating a program that will extract the data from the lotofacil site of the…
-
1
votes1
answer47
viewsWhy does the HTTP request response not recognize special characters?
Why when I make a request using Beautifulsoup in Python, my answer does not consider Latin characters? Code: import requests from bs4 import BeautifulSoup req = requests.post(url=…
-
1
votes2
answers35
viewsI’m making webscraping with python and can’t break a loop
I’m automating the search for gamer laptops on Amazon, which in addition to picking up the first page it picks up the next ones, but it gets to a point that won’t stop trying to pick up more pages…
-
0
votes1
answer229
viewsPython: how to edit tag with bs4
I have an html code, well polluted with Style in almost all tags, plus tags <font><span> unnecessary. How can I use beautifulsoup, to remove only attrs=style in <p> and the tags…
-
0
votes2
answers91
viewsHow to create the <!DOCTYPE html> tag in Beautiful Soup (bs4)
I wish to create the tag in Beautiful Soup (bs4), and developed the following: from bs4 import Doctype tag = Doctype('html') I did the above excerpt. But it does not create the tag . How to proceed?…
-
0
votes1
answer865
viewsRequests, Beautifulsoup <Tables>
I have a website that wants to extract specific data from a table I want to extract all information that has "PROLONG". My difficulty is that all tables have the same name in the "class"…
-
0
votes1
answer295
viewsScraping using Selenium and Beautifulsoup
I’m trying to make a Crap on a book blog, I need to get the titles and categories of all the books posted. In the first attempt, I got an Attribute Error, which should happen several times because…
-
0
votes4
answers197
viewsRemove comment tag and your content in Beautifulsoup 4
How do I remove the comment tag along with its content with bs4 ? <div class="foo"> A Arara é um animal voador. <!-- <p>Animais Nome: Arara Idade: 12 anos e 9 meses Tempo de Vida: 15…
-
0
votes1
answer62
viewsI can’t do the "web scraping" properly from a Python comic strip site
Well, I was making a code that would check the day of each strip/gif of the page and, if the day is the same as the current day (in the code I put 14 only because the site does not update weekend…
-
0
votes1
answer41
viewsPython Beautifulsoup doubt
I am making a requests.post and it returns me the following information: {"Name":"Joey Triibianni","Vocation":"Knight","Level":"425","World":"Quelibra","Account Status:":"Free Account"} I’ve tried…
-
0
votes1
answer56
viewsHow can I replace this regular expression using Beautiful Soup
Currently I use this expression to extract everything below the tag <b> until I find another tag <b>: blocks = re.findall(r'<b>.+?<b>', str(element)) How can I do the same…
-
0
votes1
answer342
viewsDoubt how to scrape data like Python using Beautifulsoup <Table>
I’m beginner and I’m trying to get a table of the website of the portal of transparency, but I’m not getting only comes to tag with no data. When I open the developer tool I visualize the data I…
-
0
votes2
answers125
viewsDoubts about the Use of Beautifulsoup
My code below is to take the genre of the movies of the site IMDB, however I’m not knowing to take the tag in specific genres of the site, because sometimes instead of it catch the genre he takes…
-
0
votes1
answer33
viewsAccess Tag via beautifulsoup
Hello, I’m having difficulty accessing the price that is in the third line of the code via beautifulsoup. Does anyone have any idea how to access? <span id="ctl00_Conteudo_ctl01_spanPrecoPor"…
-
0
votes1
answer78
viewsWebscrapping Soup + python export to txt and check with shell script
Greetings people, I’m here with a python code that brings me the milliseconds of the E-tax Note Sending ping from the E-tax portal in the NFC-e status portal as below : #!/usr/bin/env python # -*-…
-
0
votes1
answer220
viewsSwitching pages in an html table with beautifulsoup
I’m collecting the data on this one website, using requests and beautifulsoup. I was able to collect all the data from page 1, but I cannot change the page. Python code variaveis = [] df_list = []…
-
0
votes3
answers1214
viewsbs4.Featurenotfound (Beaultifullsoup and parser error)
I need to extract all the text from an html. So I decided to look at Beaultisoup, to see how I did it with it. But he started to show the text right at the beginning, here’s the code: import…
-
0
votes0
answers45
viewsWebscraping of pictures in comments
I’m working on a web scraper that needs to redeem comments in a forum that allows the upload images. The text and author of the commentary was able to obtain using a findAll in Beautiful Soup, but I…
-
0
votes0
answers183
viewsHTTP Error 429: Too Many Requests in Web scraping in repl
When executing the code below, find: HTTP Error 429: Too Many Requests the server must have a time limit between the requests. #Imports necessários do bs4 import bs4 from urllib.request import…
-
0
votes1
answer509
viewsDownload images from a python txt(links) list
First I imported the packages and created a class and its settings: class Scraper: def __init__(self): self.visited = set() self.session = requests.Session() self.session.headers = {"User-Agent":…
-
0
votes1
answer80
viewsAdd elements in Dictionary and convert them to string
Hello, I am doing some tests with Scraping using the library "Beautifulsoup" python and I came up with a question. I was able to extract some information from a site like product title, sku, and its…
-
0
votes2
answers345
viewsWeb scraping with Beautifulsoup - find_next does not return text
I want to extract the text from the section below: <div class="matchDate renderMatchDateContainer" data-kickoff="1583784000000">Mon 9 Mar 2020</div> the text would be "Mon 9 Mar 2020".…
-
0
votes0
answers164
viewsGet a specific HTML attribute using Beautifulsoup
I’m trying to capture (catch) an attribute called srcset within a tag img <img _ngcontent-games2-c5="" class="mdc-image-list__image ng-lazyloaded" offset="100" src="/assets/img/lazy-load.jpg"…
-
0
votes1
answer457
viewsHow to use multithreading with requests?
Hello, I’m developing an availability Hecker to change names for a game, but it is very slow, I read about the multithreading module but I found it confusing to use, I have no idea how to implode it…
-
0
votes1
answer53
viewsHow to move the title from one column to another?? (web scraping-python)
I’m trying to make a web scraping, but if you view the site you notice that certain titles are on certain columns. What my program does is take the table, create two full columns of Nan and assign…
-
0
votes0
answers57
viewsBeautifulsoup is returning None
I’m trying to get the title of a product on the Amazon website, but the value returned is always None. Product link:…
-
0
votes0
answers39
viewsI would like to iterate on a Beautiful Soup object, but at the end it gives an error in find() because it is a list. How could I solve this error?
import requests from bs4 import BeautifulSoup import pandas as pd def get_object(url): soup = BeautifulSoup(requests.get(url).content,'html.parser') return…
-
0
votes1
answer49
viewsHow to take a differentiated input value using Beautifulsoup
I’m doing a little program to learn how to use Beautifulsoup and I’m doing a currency dealer that converts the value of currency X to currency Y, my program initially taken from the site of Iban all…
-
0
votes0
answers17
viewsHow to make a calculation between a Beautifulsoup value and user input?
I’m doing a program that converts currency values. Then he receives 3 user queries: the selection of the 1st country currency, the selection of the 2nd country currency and the value he wants to…
-
0
votes1
answer42
viewsPython Beautifulsoup remove tag within tag
I’m having a problem while making a Scrap of a page and capturing text. Basically the beginning of my code is as follows: url0 =…
-
0
votes0
answers19
viewsProblem extracting web page data with Beautiful Soup in python
I made a script in python to access the portal of records of the Inmetro to make a search among the existing certificates. In this case, my script accesses this link and takes all records from the…