Most voted "beautifulsoup" questions

Beautiful Soup is a library written in Python for HTML and XML document analysis. Widely used in the practice of Web Scraping, it creates a document analysis tree for collecting or scraping information from a web page.

Learn more…

60 questions

Sort by count of

5
votes

1
answer

90
views

How to use Beautifulsoup’s "find" to find a script tag with a specific type?

For a while I have been studying how to use Beautifulsoup to be able to find tag content etc. But I came across a problem where the content I want to find is inside a tag <script…

html python beautifulsoup
asked 4 years, 7 months ago Lucas Coacci 51
3
votes

1
answer

84
views

bs4: How to wrap an incomplete html code?

Hello, I came across incomplete html codes where are missing the tags "html" and "body". Follow the code I implemented: import bs4 content=''' <head> <title> my page </title>…

python python-3.x beautifulsoup
asked 7 years, 1 month ago britodfbr 688
3
votes

1
answer

1193
views

Web scraping with Python (Selenium and Request)

Hello, I am trying to perform a web scraping on a page protected by login, I have already managed to access both via Request, and via Selenium, the problem is after login. The page is as follows:…

python selenium request scraping beautifulsoup
asked 6 years, 5 months ago Pedro Costa 503
2
votes

2
answers

1229
views

Catch tags within tags in Beautifulsoup

I have the following situation: <a href="https://g1.globo.com">Globo</a> <h3 class="b"> <a href="https://www.google.com">Google</a> </h3> Using Beautifulsoup, as…

python-3.x beautifulsoup
asked 6 years, 9 months ago Antony Leme 197
2
votes

1
answer

77
views

How to extract text from a selected Beautifulsoap element?

I’m making a simple Crawler to get some news from the financial market. The code below is working properly, but would like to extract only the headline and then delete the html/CSS codes. import…

python web-scraping web-crawler beautifulsoup
asked 4 years, 8 months ago Edu Barros 59
2
votes

2
answers

580
views

How to collect data in web Crapping in Python?

Within of this URL, has several links , I have to take the links for the month of June 2017, download them and create a dataframe with all the files in one. But I stopped here at this part, how can…

python-3.x web-scraping beautifulsoup urllib
asked 6 years, 2 months ago Sarmento 41
1
votes

1
answer

2667
views

How to avoid Max retries exceeded error in scraping in Python?

In Python 3 I made a program to scrape table lines from a public website with several pages (97893). And I create a list with the rows of each column and put a sleep to try to prevent scraping from…

python web-scraping beautifulsoup
asked 7 years, 6 months ago Reinaldo Chaves 333
1
votes

1
answer

278
views

Beautiful Soup - Remove a tag keeping Text

I have the following tags: <p>Projeto N <sup>o</sup> 00.000, DE 00 DE JANEIRO DE 0000.</p> I would like to remove the tag keeping the text. I needed it to stay that way:…

python beautifulsoup
asked 6 years, 8 months ago Igor Gabriel 530
1
votes

0
answers

29
views

Multiple search factors in an html file with variable parameters

Good morning, everyone. Needed to search by the name of the presidents of brazil, in html files. I created a json with the names of the presidents to facilitate. Follows the code: # !/bin/env python…

python python-3.x beautifulsoup
asked 7 years, 1 month ago britodfbr 688
1
votes

1
answer

88
views

Error printing HTML with Beautifulsoup

I have a simple code that accesses a quiz site and takes all the ul which contain the class square and prints on screen. url = "http://quizdomilhao.com.br/category/g1" question_page =…

html python python-3.x python-requests beautifulsoup
asked 4 years, 6 months ago Joa Roque 37
1
votes

1
answer

347
views

Scraping data using Robobrowser

I’m trying to scrape a form, to insert an attachment and submit, using Robobrowser. To open the page I do: browser.open('url') To get the form I make: form = browser.get_form(id='id_form') To enter…

python web-scraping beautifulsoup
asked 6 years, 7 months ago Rafael 477
1
votes

2
answers

323
views

Python-based web Scrapping does not provide complete html page information

Personal greetings, I’m trying to use python to get the information from the page http://www.nfce.se.gov.br/portal/painelMonitor.jsp , is a page from Faz where it has the ping status of NFC-e…

html python shell beautifulsoup
asked 6 years, 5 months ago Rafael Xavier Suarez 91
1
votes

1
answer

123
views

Text.strip() Python

I am trying to make a code that extracts some information from a page. The file has the following format: <tr class="impar"> <td class="id"> <a…

python beautifulsoup
asked 6 years, 5 months ago Diego Rangel 133
1
votes

1
answer

1875
views

Web scraping on page with login and password

I am trying to extract source code from an html file with the following style: <div class="both"></div> <div class="st-box" id="source-code"> <h3>SOURCE CODE</h3>…

python beautifulsoup
asked 6 years, 5 months ago Diego Rangel 133
1
votes

0
answers

42
views

Reading Table in Python with Beautiful Soup

I need to get a table of the transparency portal to then write to the database. I am using Beautiful Soup. I can’t bring in the request the part that has the data and consequently no tag that I look…

python python-3.x web-scraping beautifulsoup
asked 6 years, 4 months ago wmoura12 11
1
votes

2
answers

816
views

In requests, how to correctly read the ISO-8859-1 encoding?

In Python3, with beautifulsoup4 and requests, I want to extract some information from a site that has encoding 'ISO-8859-1'. I tried this strategy to show correctly the text: import requests from…

python character-encoding python-requests beautifulsoup
asked 6 years, 1 month ago Reinaldo Chaves 333
1
votes

2
answers

1450
views

How to extract information from table(html) and move to a Dataframe - Using Selenium

I am using Selenium to access the site http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-ajustes-do-pregao-ptBR.asp and manipulate the date box and ok button. So far I managed to do the task…

python selenium selenium-webdriver beautifulsoup
asked 5 years, 10 months ago LePy 281
1
votes

2
answers

294
views

Regex does not take all span tag strings

I know there are HTML parsers, but since my HTML is not well structured, I also need to use regular expressions. The HTML is like this: <tr bgcolor="#CCCCCC"> <td colspan="2"><font…

html python regex beautifulsoup
asked 5 years, 4 months ago Caio Camargo 19
1
votes

2
answers

185
views

Beautifulsoup: Catch text inside table

I’m trying to get specific values within a table, I have a similar code that I already use in the same way in another unique table structure within html, the problem and that I can’t get the text of…

html python table web-scraping beautifulsoup
asked 4 years, 5 months ago kleytonsolinho 31
1
votes

2
answers

101
views

Doubt about value extraction using Soup.findAll() in Python

Good afternoon to all. I am studying python and learning to extract data on websites and to start this learning I am creating a program that will extract the data from the lotofacil site of the…

python python-2.7 beautifulsoup
asked 4 years, 3 months ago Douglas Almeida de Mesquita 13
1
votes

1
answer

47
views

Why does the HTTP request response not recognize special characters?

Why when I make a request using Beautifulsoup in Python, my answer does not consider Latin characters? Code: import requests from bs4 import BeautifulSoup req = requests.post(url=…

html python request python-requests beautifulsoup
asked 4 years, 2 months ago all0y74 11
1
votes

2
answers

35
views

I’m making webscraping with python and can’t break a loop

I’m automating the search for gamer laptops on Amazon, which in addition to picking up the first page it picks up the next ones, but it gets to a point that won’t stop trying to pick up more pages…

python beautifulsoup
asked 4 years ago Farofa de Cachorro 36
0
votes

1
answer

229
views

Python: how to edit tag with bs4

I have an html code, well polluted with Style in almost all tags, plus tags <font><span> unnecessary. How can I use beautifulsoup, to remove only attrs=style in <p> and the tags…

python python-3.x beautifulsoup
asked 7 years, 4 months ago britodfbr 688
0
votes

2
answers

91
views

How to create the <!DOCTYPE html> tag in Beautiful Soup (bs4)

I wish to create the tag in Beautiful Soup (bs4), and developed the following: from bs4 import Doctype tag = Doctype('html') I did the above excerpt. But it does not create the tag . How to proceed?…

python beautifulsoup
asked 7 years, 2 months ago britodfbr 688
0
votes

1
answer

865
views

Requests, Beautifulsoup <Tables>

I have a website that wants to extract specific data from a table I want to extract all information that has "PROLONG". My difficulty is that all tables have the same name in the "class"…

html python-3.x python-requests beautifulsoup
asked 7 years, 2 months ago Dalmo Cabral 123
0
votes

1
answer

295
views

Scraping using Selenium and Beautifulsoup

I’m trying to make a Crap on a book blog, I need to get the titles and categories of all the books posted. In the first attempt, I got an Attribute Error, which should happen several times because…

python-3.x selenium beautifulsoup
asked 7 years, 1 month ago Lucas Maraal 125
0
votes

4
answers

197
views

Remove comment tag and your content in Beautifulsoup 4

How do I remove the comment tag along with its content with bs4 ? <div class="foo"> A Arara é um animal voador. <!-- <p>Animais Nome: Arara Idade: 12 anos e 9 meses Tempo de Vida: 15…

python python-3.x beautifulsoup
asked 6 years, 7 months ago Igor Gabriel 530
0
votes

1
answer

62
views

I can’t do the "web scraping" properly from a Python comic strip site

Well, I was making a code that would check the day of each strip/gif of the page and, if the day is the same as the current day (in the code I put 14 only because the site does not update weekend…

python python-3.x web-scraping beautifulsoup
asked 6 years, 10 months ago Matheus Andrade 27
0
votes

1
answer

41
views

Python Beautifulsoup doubt

I am making a requests.post and it returns me the following information: {"Name":"Joey Triibianni","Vocation":"Knight","Level":"425","World":"Quelibra","Account Status:":"Free Account"} I’ve tried…

python-3.x beautifulsoup
asked 5 years, 11 months ago boka 3
0
votes

1
answer

56
views

How can I replace this regular expression using Beautiful Soup

Currently I use this expression to extract everything below the tag <b> until I find another tag <b>: blocks = re.findall(r'<b>.+?<b>', str(element)) How can I do the same…

python python-3.x beautifulsoup
asked 5 years, 8 months ago Rodrigosis 9
0
votes

1
answer

342
views

Doubt how to scrape data like Python using Beautifulsoup <Table>

I’m beginner and I’m trying to get a table of the website of the portal of transparency, but I’m not getting only comes to tag with no data. When I open the developer tool I visualize the data I…

python python-3.x web-scraping beautifulsoup
asked 6 years, 8 months ago jaderson08 1
0
votes

2
answers

125
views

Doubts about the Use of Beautifulsoup

My code below is to take the genre of the movies of the site IMDB, however I’m not knowing to take the tag in specific genres of the site, because sometimes instead of it catch the genre he takes…

python beautifulsoup
asked 6 years, 8 months ago Daniel Rosendo de Souza 71
0
votes

1
answer

33
views

Access Tag via beautifulsoup

Hello, I’m having difficulty accessing the price that is in the third line of the code via beautifulsoup. Does anyone have any idea how to access? <span id="ctl00_Conteudo_ctl01_spanPrecoPor"…

python web-scraping beautifulsoup
asked 6 years, 7 months ago Diogo Gonnelli 11
0
votes

1
answer

78
views

Webscrapping Soup + python export to txt and check with shell script

Greetings people, I’m here with a python code that brings me the milliseconds of the E-tax Note Sending ping from the E-tax portal in the NFC-e status portal as below : #!/usr/bin/env python # -*-…

python shell web-scraping beautifulsoup
asked 6 years, 5 months ago Rafael Xavier Suarez 91
0
votes

1
answer

220
views

Switching pages in an html table with beautifulsoup

I’m collecting the data on this one website, using requests and beautifulsoup. I was able to collect all the data from page 1, but I cannot change the page. Python code variaveis = [] df_list = []…

python web-scraping python-requests beautifulsoup
asked 6 years, 4 months ago Pedro 31
0
votes

3
answers

1214
views

bs4.Featurenotfound (Beaultifullsoup and parser error)

I need to extract all the text from an html. So I decided to look at Beaultisoup, to see how I did it with it. But he started to show the text right at the beginning, here’s the code: import…

python-3.x beautifulsoup
asked 6 years, 4 months ago user124673
0
votes

0
answers

45
views

Webscraping of pictures in comments

I’m working on a web scraper that needs to redeem comments in a forum that allows the upload images. The text and author of the commentary was able to obtain using a findAll in Beautiful Soup, but I…

python beautifulsoup
asked 6 years ago Augusto Coelho 1
0
votes

0
answers

183
views

HTTP Error 429: Too Many Requests in Web scraping in repl

When executing the code below, find: HTTP Error 429: Too Many Requests the server must have a time limit between the requests. #Imports necessários do bs4 import bs4 from urllib.request import…

python web-scraping beautifulsoup
asked 5 years, 8 months ago Gustavo William 11
0
votes

1
answer

509
views

Download images from a python txt(links) list

First I imported the packages and created a class and its settings: class Scraper: def __init__(self): self.visited = set() self.session = requests.Session() self.session.headers = {"User-Agent":…

python-3.x beautifulsoup
asked 5 years, 8 months ago Hudson Souza 53
0
votes

1
answer

80
views

Add elements in Dictionary and convert them to string

Hello, I am doing some tests with Scraping using the library "Beautifulsoup" python and I came up with a question. I was able to extract some information from a site like product title, sku, and its…

python beautifulsoup
asked 5 years, 7 months ago Guilhermecor 9
0
votes

2
answers

345
views

Web scraping with Beautifulsoup - find_next does not return text

I want to extract the text from the section below: <div class="matchDate renderMatchDateContainer" data-kickoff="1583784000000">Mon 9 Mar 2020</div> the text would be "Mon 9 Mar 2020".…

html python web-scraping beautifulsoup
asked 5 years, 3 months ago Otávio Simões Silveira 3
0
votes

0
answers

164
views

Get a specific HTML attribute using Beautifulsoup

I’m trying to capture (catch) an attribute called srcset within a tag img <img _ngcontent-games2-c5="" class="mdc-image-list__image ng-lazyloaded" offset="100" src="/assets/img/lazy-load.jpg"…

python-3.x beautifulsoup
asked 5 years, 3 months ago Gabriel 129
0
votes

1
answer

457
views

How to use multithreading with requests?

Hello, I’m developing an availability Hecker to change names for a game, but it is very slow, I read about the multithreading module but I found it confusing to use, I have no idea how to implode it…

python-3.x request beautifulsoup
asked 5 years, 1 month ago carlitoshow 1
0
votes

1
answer

53
views

How to move the title from one column to another?? (web scraping-python)

I’m trying to make a web scraping, but if you view the site you notice that certain titles are on certain columns. What my program does is take the table, create two full columns of Nan and assign…

python pandas selenium-webdriver web-scraping beautifulsoup
asked 5 years ago Frybii 43
0
votes

0
answers

57
views

Beautifulsoup is returning None

I’m trying to get the title of a product on the Amazon website, but the value returned is always None. Product link:…

python-3.x python-requests beautifulsoup
asked 4 years, 10 months ago Lucas 11
0
votes

0
answers

39
views

I would like to iterate on a Beautiful Soup object, but at the end it gives an error in find() because it is a list. How could I solve this error?

import requests from bs4 import BeautifulSoup import pandas as pd def get_object(url): soup = BeautifulSoup(requests.get(url).content,'html.parser') return…

python beautifulsoup
asked 4 years, 8 months ago isabele alves pereira 39
0
votes

1
answer

49
views

How to take a differentiated input value using Beautifulsoup

I’m doing a little program to learn how to use Beautifulsoup and I’m doing a currency dealer that converts the value of currency X to currency Y, my program initially taken from the site of Iban all…

python beautifulsoup
asked 4 years, 6 months ago SrTony 21
0
votes

0
answers

17
views

How to make a calculation between a Beautifulsoup value and user input?

I’m doing a program that converts currency values. Then he receives 3 user queries: the selection of the 1st country currency, the selection of the 2nd country currency and the value he wants to…

python beautifulsoup
asked 4 years, 6 months ago Nitczi 1
0
votes

1
answer

42
views

Python Beautifulsoup remove tag within tag

I’m having a problem while making a Scrap of a page and capturing text. Basically the beginning of my code is as follows: url0 =…

python python-3.x beautifulsoup
asked 4 years, 4 months ago Jessica Voigt 883
0
votes

0
answers

19
views

Problem extracting web page data with Beautiful Soup in python

I made a script in python to access the portal of records of the Inmetro to make a search among the existing certificates. In this case, my script accesses this link and takes all records from the…

python url web-scraping python-requests beautifulsoup
asked 3 years, 11 months ago jonasmuller98 1