Most voted "web-scraping" questions
It’s the process of extracting information from websites. It is typically used by third-party applications to extract information or interact with a website that does not expose an API.
Learn more…191 questions
Sort by count of
-
1
votes1
answer477
viewsWeb scraping at a specific url with Beautifulsoup
from bs4 import BeautifulSoup import requests import re url = 'http://www.bhaktiyogapura.com/2017/03/calendario-vaisnava-marco-de-2017/' header = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64;…
-
1
votes1
answer114
viewsWeb Crawler with Django’s view.py
I am making a simple web Crawler, using Django 2.0, I want to capture only the "title" class of the news and then render "Return render" to a simple html, below my view.py. I am currently using…
-
1
votes3
answers1839
viewsWeb scraping python running javascript on the CEF website
The CEF (Caixa Econômica Federal) changed the way it displays the results of the lotteries on its website, before I could get the results that all came in HTML via webscraping relatively easily…
-
1
votes0
answers58
viewsConverting Python Web Scraper 2.7 to 3.5
Good afternoon, everyone! Here’s the deal: I found a script in Python 2.7 but I have version 3.6. As I am new in this field, I wanted to work manually to convert this script. Follow the code below:…
-
1
votes1
answer300
viewsUrllib.request or Request?
I am studying web scraping and in many guides I have seen examples where they are used urllib.request and request.get. From what I’ve tested and understood the two do the same thing. So what’s the…
-
1
votes0
answers171
viewsError: read ECONNRESET and Error: connect ETIMEDOUT
Good night, you guys. I’m doing scraping (web scraping) from a website, I’m using the Nodejs and Xios. When I run the application it works perfectly bringing me the information I requested, but it…
-
1
votes1
answer2076
viewsSelect an option from the Selenium Python drop down menu
I have a menu that presents several options, I want to select only the one that is active. When I give 'variable'. find_element_by_id('key') Selenium returns me ALL options. The active option has a…
-
1
votes2
answers88
viewsOrganize string data flow by default
Friends, I am working on a scraping project. At some point, I capture a table on the screen in the shape of a giant string, more or less like this: list = ('0004434-48.2010 n EU n (30 working days)…
-
1
votes1
answer382
viewsPython: select checkbox in an orderly way
I have a list containing hundreds of data in the format [ '5008489', 'Órgão: MPF', 'PROCEDIMENTO DO JUIZADO ESPECIAL', 'CPF', <selenium.webdriver.remote.webelement.WebElement…
python checkbox selenium selenium-webdriver web-scrapingasked 7 years, 5 months ago Bergo de Almeida 181 -
1
votes1
answer51
viewsWeb Scrapping R
I tried several ways but I can’t make Scrapping from the following table: http://www2.bmf.com.br/pages/portal/bmfbovespa/boletim1/TxRef1.asp. Until now through the following code: library("rvest")…
-
1
votes1
answer1158
viewsFill JS Form - Web Scraping in Python - Selenium and Phantomps
Friends. I’m developing a code to access the Anbima, fill in the fields and download the generated txt. I have been looking for a solution to this problem for a few days. So far, I have found that…
-
1
votes0
answers198
viewsphpQuery Web Scraping Event
I want to get information from the website using phpQuery, but I’m still learning how to use. The information I want to get appears in a select but only after clicking it. Without clicking it…
-
1
votes1
answer860
viewsBeautifulsoup - True href links
I was studying about Webscraping with Python and started using the bs4 bilioteca (Beautifulsoup). When I started picking up the tags a and the attribute href, I realized that I could not access the…
-
1
votes1
answer172
viewsOn big scrapings how to avoid Connectionerror?
In Python 3, I have a program to make web-scraping tables on websites. There are 5,299 pages, on each page there is a table With XHR I found the generated JSON on each page. But there is always a…
-
1
votes1
answer219
viewsHow do I display variables on a Django page?
I am new to Django and am taking some information from a web page using lxml. I would like to know how to display the values on my website. import requests from lxml import html from…
-
1
votes1
answer332
viewsScratch parameters of a post method, with scrapy in python!
I need to collect information from a website using Spiders within Scrapy in Python, but the site is a method post and I’m learning the language while developing the project. I found a model of post…
-
1
votes1
answer87
viewsSimultaneous threading (parallel processing) in R and serialized recording in Sqlite
Hey there, guys. I am trying to develop a code that makes it possible to perform parallel processing (parser) of HTML files using the R Language and, consecutively, record the data extracted from…
-
1
votes1
answer347
viewsScraping data using Robobrowser
I’m trying to scrape a form, to insert an attachment and submit, using Robobrowser. To open the page I do: browser.open('url') To get the form I make: form = browser.get_form(id='id_form') To enter…
-
1
votes0
answers42
viewsReading Table in Python with Beautiful Soup
I need to get a table of the transparency portal to then write to the database. I am using Beautiful Soup. I can’t bring in the request the part that has the data and consequently no tag that I look…
-
1
votes1
answer124
viewsI need to compare the value of the last filled cell with the antepenultimate
ola thank you for your attention. I need to compare the value of the last filled cell with the antepenultimate. if the value is different I want to continue with the current value if seje = place…
-
1
votes0
answers651
viewsCurl error 60: SSL Certificate problem: self Signed Certificate in Certificate chain (see http://curl.haxx.se/libcurl/c/libcurl-errors.html)
I’m trying to accomplish a WebScraping, however I am getting the following error as return: Curl error 60: SSL Certificate problem: self Signed Certificate in Certificate chain (see…
-
1
votes1
answer40
viewsProblem converting JSON to dataframe in R
I want to extract a JSON content from a website and convert it into a dataframe Website is https://schutz-shoes.com/products/amaia-sandal-metallic-leather?color=ouro%20gold Inside that site,…
-
1
votes1
answer1363
viewsSites with authentication - Web Scraping - Python
BR: I’m trying to automate a web data acquisition process using Python. In my case, I need to pull the information from the page https://sistema.justwebtelecom.com.br/adm.php. However, before going…
-
1
votes1
answer365
viewsAutomate web scraping in Python
I’m trying to get the speeches of the deputies, which can be found here. The site has several pages (1 to 300 +/-) and on each page has a table with a "summary" of the information, with 50 lines.…
-
1
votes0
answers15
viewsWeb scraping using python
I would like to print all the Divs of a particular website that are contained within a superior div <div class="history-feed__collection"> <div class="history-feed__card h-card h-card_sm…
web-scrapingasked 4 years, 7 months ago Gustavo 19 -
1
votes1
answer769
viewsXpath with Python - Pick up text after tag in a div
I’m trying to get a text after a tag that’s inside a div, in an html. The problem I’m having is that I’m not getting the text, just an empty string. I’ve looked elsewhere and I haven’t seen anyone…
-
1
votes1
answer31
viewsIterating over list with B Soup
I am trying to realize web Scrapping of a list of episodes of a series with BS. I mounted the structure below: #Importando todos os módulos import bs4 from bs4 import BeautifulSoup import…
-
1
votes1
answer650
viewsRun python script by clicking html button
I need to feed a page html that will load and display the content dynamically with ajax/fetch. The problem is that I need to take this data from other websites that also upload this content through…
-
1
votes1
answer73
viewsJsondecodeerror Expecting value: line 1 column 1 (char 0) - content-type: text/xml
I have a project to capture Atms next to a coordinate on the Mastercard website form. I can bring the result but not in json. By Content-Type be text.XML, should not allow to bring result in json?…
-
1
votes0
answers57
viewsScraping on instagram
Hello, I would like to ask for a help, I’m wanting to do a scraping on Instagram to be able to analyze personas and extract some data as tags most used by people who follow a certain someone, I took…
-
1
votes1
answer36
viewsChange the language of the result of a web-scraping with rvest from the IMDB site
I want to collect information about the IMDB Top 250 using the package rvest. While visiting the page link, the names of the movies appear in their original language, at least in my browser (Firefox…
-
1
votes2
answers185
viewsBeautifulsoup: Catch text inside table
I’m trying to get specific values within a table, I have a similar code that I already use in the same way in another unique table structure within html, the problem and that I can’t get the text of…
-
1
votes0
answers33
viewsWeb Scraping with R - static table
I would like to consolidate the data of a betting site in a database on R. I’m trying it the way below, but my script doesn’t recognize the columns and rows of the table in fact, only the layout:…
-
1
votes1
answer143
viewsRuntime Error 438: Object does not accept property or Methods
I’m adapting VBA code for scraping, but I’m getting this message when it comes to sending the data to the Login form. Runtime Error 438: Object does not accept property or Methods Public Sub…
-
0
votes1
answer224
viewsHow to get the headlines of the Olympics on the CNN website with Python using Beautifulsoup?
I’d like an example of how to take the headlines of the Olympics in http://edition.cnn.com/sport/olympics using Beautifulsoup.…
-
0
votes2
answers239
viewsWeb Scraping how to insert the result into the <img src=
I’m making a web scraping of a website, however I would like the returned images to come to me inside the <img src= but I’m not succeeding // Find all images foreach($html->find('img') as…
-
0
votes0
answers87
viewsCrawler for Woocommerce
Friends good afternoon. I’m developing a php Crawler that will make Scrapping some urls that I will inform. I’m trying to get him to pull the values of a dynamic url, but I’m not getting it. Could…
-
0
votes1
answer509
viewsWhat is the best way to scrape the Datasus website in Python?
The link is this: http://tabnet.datasus.gov.br/cgi/tabcgi.exe?sih/cnv/nrbr.def I’m trying to send a POST through requests with a dictionary containing the categories I want, but then the URL remains…
-
0
votes1
answer1001
viewsHow to capture the td of a web page using Selenium vba?
html code looks like this: <table> <tr> <td width="01%" class="tex3b"><img height="14" src="/imagens/tm_bullet.gif" width="6"></td> <td width="20%"…
-
0
votes1
answer53
viewsParse Xpath from Int
I have a scrapy running the for to bring the day and link to something. Ex: t_day = div.xpath('.//a/text()').extract_first() a_day = div.xpath('.//a/@href').extract_first() day = int(t_day) if day…
-
0
votes1
answer458
viewsSave Excel file to Python via Scrapy
As I do for mine Spider save all Excel data in a single XML file links which I extract? Or also save in each single XLS file in the project folder? Part of my Spider: def parse(self, response): divs…
-
0
votes0
answers80
viewsExtract pdf documents from scrapy sites
It is possible to scan an entire site by going through all links in search of scrapy pdf files? would be something like apache nutch. I did a search but the staff only uses Xpath, and Xpath can not…
-
0
votes0
answers59
viewsWeb Browser does not load link by Navigate win form c#
Good afternoon, I am doing a test to create a Webscraping in c#, but does not load the page in the web form, presenting the javascript error Can anyone help with this mistake?…
-
0
votes1
answer47
viewsSelect does not update table data after selecting an option
I am trying to select the field with this query however the value of select is changed but does not reload the table values, showing all. But by clicking with the mouse, it works.…
-
0
votes1
answer142
viewsHow to create an Array within the other
I need to create an array that has Indice and values page_links receives the links of a page all_links_main = [] for link in page_links: all_links_main.append(link.get('href')) produto = [] for…
-
0
votes1
answer82
viewsHow to click a checkbox when another obscure element is it?
I’m writing a code to automate some processes on the SIAFI site, I couldn’t get Python to click on a checkbox, except by importing the package pynput and using the mouse positioning function with…
-
0
votes0
answers76
viewsPython, downloading file on a given day and time
I’m looking to structure a Python program that downloads files (manga) from a given site once a week. I’m training, I took the course of web scraping, but I am lost on how to perform these requests.…
-
0
votes1
answer470
viewsIterating web pages using Requests and Python
I am beginner in web scraping. How to learn making a database from data on selling semi-new cars on some websites. One of the sites is this url =…
-
0
votes1
answer407
viewsSpecific chunk break in JSON file with python
Is it possible to break a line from a specific section of Json, transform it into an array, and then streamline it? Why do I ask this.. I am developing a file mining bot and came across a situation…
-
0
votes0
answers58
viewsHow to extract data for Models.py fields from Scrapy?
I intend to remove all "Municipios" from the tag starting on this page. https://www.anmp.pt/anmp/pro/mun1/mun101w3.php?cod=M2200 And then remove information such as: "name of the council", "mayor",…