Most voted "web-scraping" questions
It’s the process of extracting information from websites. It is typically used by third-party applications to extract information or interact with a website that does not expose an API.
Learn more…191 questions
Sort by count of
-
29
votes1
answer22467
viewsMacro to access site with login
I run a daily routine of accessing the Serasa site and make the CNPJ query. I need to develop a macro to access this Serasa site, log in and then query, and then play the information in Excel.…
-
13
votes2
answers2429
viewsHow to recognize and change the encoding of Latin characters in R?
Is there any efficient way to recognize the encoding of texts downloaded from the internet? I made a scraping of any site (see code below) and I can’t find the correct encoding. In the META tag of…
-
11
votes3
answers2014
viewsWeb scraping with R
I am trying to make a Web Scrapping of the following link: http://empresasdobrasil.com/empresas/alta-floresta-mt/ I want to access all categories and extract a data frame with the name of all…
-
7
votes2
answers686
viewsHow to make webscrapping of an https using rvest?
I would like to shave a page that is in https using the package rvest. However, it is a website with problems in the security certificate. In such cases, you need to turn off the SSL verification --…
-
7
votes1
answer73
viewsLimiting the number of regex Matches with Python
I’m having a little trouble, I’d like to create a for in the Python to return a specific amount of match of regex. The way I did, he’s returning all the links that exist and that meet the defined…
-
6
votes1
answer2565
viewsExtract information from lattes
Introducing Since 1999, Brazilian researchers have had a website where they can post information about their academic career. This information is known as Currículos Lattes. I wish to download a few…
-
5
votes2
answers240
viewsrender specific part of a page
I am using the following code to render a web page: import dryscrape # set up a web scraping session sess = dryscrape.Session(base_url = 'http://www.google.com') # we don't need images…
-
5
votes1
answer241
viewsProgrammatically generate links and download content
I would like to know how I would collect data from a website. The site is http://www.ons.org.br/historico/energia_natural_afluente.aspx . There I have to download all the operational historical data…
-
5
votes1
answer473
viewsHow to make the webscrapping of a site that has post method?
I’m having trouble doing the webscrapping for sites using the method post, for example, I need to extract all news related to political parties from the website: http://www.diariodemarilia.com.br.…
-
5
votes1
answer135
viewsWeb Scraping: How to change the value of a drop down button on a site using R?
I want to create a script in R to read an HTML table. Do this from a static page with the package rvest is easy, the problem is that I have to change the value of two page buttons. This is the site…
-
5
votes1
answer1874
viewsHow to collect data from a web page?
Web data collection, or Web Scraping, is a form of mining that allows the extraction of data from web sites by converting them into structured information for further analysis. Present here your…
-
4
votes1
answer1266
viewsWeb Scraping Selenium + Python on JS-generated website = difficulty mapping elements
Good afternoon. I am developing a script that: accesses a system; within the environment, you will find certain information; generates a kind of report; creates a spreadsheet with the data. My…
-
4
votes1
answer748
viewsHow to extract content from the Web (Web scraping) with C#?
I recently learned how to make web scraping and I got it on some sites, but others I can’t. I noticed that in some of the ones I can’t get there’s an "#", what that means? Let me give you an example…
-
4
votes0
answers214
viewsDoes anyone know how to make a Web Scraping on the SICONV (Free Access) website - With R?
I’m trying to extract the information from the site of siconv dealing with covenants in R:…
-
3
votes1
answer231
viewsFile download from filling a form
I’m trying to access a site, fill out your form and download the file, but I’m encountering some difficulties. That’s my code so far: #library's require(rvest) #website url <-…
-
3
votes1
answer1215
viewsConfigure Firefox webdriver in Selenium
I’m using Selenium (Python) to fetch some data from a site, at a given time I access a link that downloads a file. How to configure the webdriver (Firefox) to automatically accept the download,…
python selenium selenium-webdriver web-scrapingasked 7 years, 7 months ago Wellington Araujo Nogueira 41 -
3
votes1
answer82
viewsHow to ignore links that do not fit the established conditions and continue with scraping?
I would like to know how to ignore the links that do not fit the conditions set in title, data_hora and text; thus managing to continue scraping the site. Error that occurs when a link does not have…
-
3
votes1
answer498
viewsError with scrapy requests
I have a csv file with some urls that need to be accessed. http://www.icarros.com.br/Audi, Audi http://www.icarros.com.br/Fiat, Fiat http://www.icarros.com.br/Chevrolet, Chevrolet I’ve got an Spider…
-
3
votes1
answer1345
viewsHow to keep only Dataframe-specific lines?
I have a code that enters a site, fills in a form and pulls a table, however, I want to delete some rows from this table that I don’t need. Let’s go to the code: #library's require(RCurl)…
-
3
votes1
answer178
viewsPOST function of the httr package returns NA
I’m trying to make a script on R to make a POST on the site: http://tabnet.datasus.gov.br/cgi/tabcgi.exe?sinannet/cnv/violebr.def, but I am not succeeding. The goal is to extract the generated data…
-
3
votes1
answer102
viewsNavigate between pages from a web page bar
How to browse pages that are in a web page bar? Specific case: When performing a query on the TCM-Ba website, on the page that records the expenses of municipalities, it is possible to access some…
-
3
votes1
answer1331
viewsA - Download data from the Hidroweb portal
The National Water Agency makes available in its portal Hydroweb the download of historical series referring to the data obtained by several monitoring stations. I would like to automate the…
-
3
votes1
answer78
viewsHow to apply opacity to a DOM element - createImage(); - through a javascript editor?
I’m using P5.js - a javascript library - to capture images from a news API. I would like these images to be superimposed, but with opacity, so that the images merge. I’m not being able to apply…
-
3
votes1
answer125
viewsHow to use the remote driver on proxy protected computer via R software Rselenium package?
Well, I need to access a site on my work network, but this is protected by proxy. Some sites accept using httr and rvest packages, others do not. To log in to site for examples I cannot. Example:…
-
3
votes1
answer71
viewsError in Webscraping process Youtube videos on R - NA' does not exist in Current Working directory
I am developing an academic work in which I should analyze the text of 25 selected videos on various Youtube channels. My advisor gave me a script about how he is developing this, so that I work on…
-
3
votes1
answer155
viewsRemove empty spaces Laravel + webscraping
I’m performing a webscraping as follows: $url = 'https://esaj.tjsp.jus.br/cpopg/show.do?processo.codigo=XXXXXXXX&processo.numero=XXXXXXX'; $client = new Client(); $crawler =…
-
2
votes1
answer59
viewsAt what stage should the data be edited?
I am currently removing data from a website, with data in English, through web scraping. If we want, for example, to translate the names or values of the fields into Portuguese, or to complete…
-
2
votes2
answers135
viewsWebscrape Scoring for Welfare
I needed to extract the information from this site for an excel file, which Members vote in favor, against, abstentions, finally. It’s a webscrape exc, but as I understand html I’m having a hard…
-
2
votes0
answers58
viewsError in submitting form
Good afternoon, I have a code that works for some forms on the web and I’m trying to reuse it on this site: http://www.anbima.associados.rtm/titulos-publicos/estrutura-a-termo/tp-estrutura-termo.asp…
-
2
votes2
answers384
viewsDownload data from Stock Exchange tables in R
I have the following code, I need to download the data that is in the table, but the dataframe is always returning empty. library(tidyverse) library(rvest) library(bizdays) library(dplyr)…
-
2
votes1
answer77
viewsHow to extract text from a selected Beautifulsoap element?
I’m making a simple Crawler to get some news from the financial market. The code below is working properly, but would like to extract only the headline and then delete the html/CSS codes. import…
-
2
votes1
answer170
viewsRselenium error - Selenium message:Java heap space
Hello, I’m trying to make a scraping of http://acervo.estadao.com.br/ using Rselenium, because the page only generates the information in html when it is loaded in the browser. Well, when it…
-
2
votes0
answers98
viewsScraping with R - xpathSApply returning a list of 0
I’m learning to read XML data in R. I wanted to extract the information of Brazilian football (championship name, game owner, result, etc.) from this site:…
-
2
votes1
answer259
viewsHow to handle errors during web scraping?
Hello, everyone. During the Web Scraping process, I started to come across some errors that occur during the request process. Currently, I have identified 4 types of frequent errors: Error in…
-
2
votes1
answer66
viewsUse lambda expressions to sophisticate the parameters of a for in c#
good afternoon! I would like to ask a question, I am developing a collection code and at a certain time it is necessary a to iterate the values of the list and then save the information in an…
-
2
votes2
answers580
viewsHow to collect data in web Crapping in Python?
Within of this URL, has several links , I have to take the links for the month of June 2017, download them and create a dataframe with all the files in one. But I stopped here at this part, how can…
-
2
votes1
answer149
viewsWeb Scraping on R
I have to download the table of this link: http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-taxas-referenciais-bmf-ptBR.asp I’m trying to use the package rvest, however, to no avail.…
-
2
votes1
answer49
viewsWeb Scrapping Nodejs - I cannot use variable as parameter
I’m using the library Nightmare, for Web Scrapping, function works normally when I pass a string as parameter '.seletor', the problem occurs when I store this value in a variable and step as…
-
2
votes3
answers516
viewsHow to compare two JSON objects with the same elements in Python
I have two Apis that bring me the data in JSON. I’m just not able to make the logic to compare the two. API 1: API 2: My logic is to compare the two on the die FlightID and if it’s the same, give me…
-
2
votes0
answers46
viewsSelenium.common.exceptions.Nosuchelementexception: Message: Unable to locate element: //div[@class='classificacao_run points']//table
Hello, I made a python webscraping on mozzarella and it worked. However, when I tried to apply the code on another site, I’m getting the title error message. In practice: When running the program…
-
2
votes2
answers172
viewsHow to scrape Qlikview tables using Nodejs?
This website of the Brazilian government presents salary data to judges of various courts and tribunals. I would like to download all tables, but the data relating to the tables are not in the html…
-
1
votes1
answer989
viewsProblem with VBA and Internet Explorer integration
I am trying to use VBA to collect data directly from the internet. I saw several examples of the use of the Internetexplorer Object, as below: Dim IE as Object Set IE = New InternetExplorer…
-
1
votes1
answer287
viewsHow to Scrapping a page that has a javascript’s using python ?
I need to make Scrapping of a page, but the entry of the page has a button (apparently a Javascript) that gives access to all the content of the page itself. Using traditional libs(urllib2,…
javascript python web-crawler web-scraping scrapingasked 7 years, 8 months ago Wellington Araujo Nogueira 41 -
1
votes1
answer761
viewsCreating a program to get important news on a website
from bs4 import BeautifulSoup import requests url = 'http://g1.com.br/' header = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) ' 'AppleWebKit/537.36 (KHTML, like Gecko) '…
-
1
votes0
answers521
viewsAutomate website browsing for software testing activities
I’m on a web systems development project, accessed by the browser. We are constantly making modifications in the operation of the processes and at each specific period we perform a test on the…
-
1
votes1
answer788
viewsHow to collect text when there is no HTML reference class - Crawler Python
I have the following situation below: I want to collect "Text to Crawler" that is below, as I will navigate there without class or id? <td>Texto para crawler</td>…
-
1
votes1
answer962
viewsWeb Scraping - convert HTML table to python Dict
I’m trying to turn an HTML table into dict@python, I came across some problems and I ask for your help. Go as far as I can go... def impl12(url='http://www.geonames.org/countries/', tmout=2): import…
-
1
votes1
answer264
viewsExtract data from a calendar with Python and Beautifulsoup (under Linux Ubuntu-like)
Friends, I’d like to take data from a calendar: http://www.purebhakti.com/component/panjika The first step would be to make the program choose the time zone ( -3:00 Buenos Aires) and click on Submit…
-
1
votes1
answer2667
viewsHow to avoid Max retries exceeded error in scraping in Python?
In Python 3 I made a program to scrape table lines from a public website with several pages (97893). And I create a list with the rows of each column and put a sleep to try to prevent scraping from…
-
1
votes3
answers691
viewsPython 3.6 regular expression for inteitra phrase extraction
I need to extract only the phrases that contain ADMINISTRATION - JUDGE OUTSIDE - NOCTURNE - SISU - GROUP B, for example. That is, I need to get only the name of the course, the city, the turn, the…