Most voted "web-scraping" questions

It’s the process of extracting information from websites. It is typically used by third-party applications to extract information or interact with a website that does not expose an API.

Learn more…

191 questions

Sort by count of

29
votes

1
answer

22467
views

Macro to access site with login

I run a daily routine of accessing the Serasa site and make the CNPJ query. I need to develop a macro to access this Serasa site, log in and then query, and then play the information in Excel.…

excel vba web-scraping
asked 11 years, 3 months ago Márcio 291
13
votes

2
answers

2429
views

How to recognize and change the encoding of Latin characters in R?

Is there any efficient way to recognize the encoding of texts downloaded from the internet? I made a scraping of any site (see code below) and I can’t find the correct encoding. In the META tag of…

r character-encoding utf-8 iso-8859-1 web-scraping
asked 9 years, 11 months ago RogerioJB 996
11
votes

3
answers

2014
views

Web scraping with R

I am trying to make a Web Scrapping of the following link: http://empresasdobrasil.com/empresas/alta-floresta-mt/ I want to access all categories and extract a data frame with the name of all…

r web-scraping
asked 10 years, 2 months ago morebru 783
7
votes

2
answers

686
views

How to make webscrapping of an https using rvest?

I would like to shave a page that is in https using the package rvest. However, it is a website with problems in the security certificate. In such cases, you need to turn off the SSL verification --…

r ssl certified web-scraping rvest
asked 10 years, 3 months ago RogerioJB 996
7
votes

1
answer

73
views

Limiting the number of regex Matches with Python

I’m having a little trouble, I’d like to create a for in the Python to return a specific amount of match of regex. The way I did, he’s returning all the links that exist and that meet the defined…

python regex web-scraping scrapy scraping
asked 7 years, 2 months ago user89389
6
votes

1
answer

2565
views

Extract information from lattes

Introducing Since 1999, Brazilian researchers have had a website where they can post information about their academic career. This information is known as Currículos Lattes. I wish to download a few…

python r web-scraping scrapy rvest
asked 7 years, 11 months ago Marcus Nunes 17,915
5
votes

2
answers

240
views

render specific part of a page

I am using the following code to render a web page: import dryscrape # set up a web scraping session sess = dryscrape.Session(base_url = 'http://www.google.com') # we don't need images…

python web-scraping
asked 10 years, 10 months ago Daniel Falbel 12,504
5
votes

1
answer

241
views

Programmatically generate links and download content

I would like to know how I would collect data from a website. The site is http://www.ons.org.br/historico/energia_natural_afluente.aspx . There I have to download all the operational historical data…

r web-scraping
asked 10 years, 3 months ago morebru 783
5
votes

1
answer

473
views

How to make the webscrapping of a site that has post method?

I’m having trouble doing the webscrapping for sites using the method post, for example, I need to extract all news related to political parties from the website: http://www.diariodemarilia.com.br.…

r post web-scraping
asked 9 years, 10 months ago Gabriel F 85
5
votes

1
answer

135
views

Web Scraping: How to change the value of a drop down button on a site using R?

I want to create a script in R to read an HTML table. Do this from a static page with the package rvest is easy, the problem is that I have to change the value of two page buttons. This is the site…

r web-scraping rvest
asked 9 years, 9 months ago iatowks 153
5
votes

1
answer

1874
views

How to collect data from a web page?

Web data collection, or Web Scraping, is a form of mining that allows the extraction of data from web sites by converting them into structured information for further analysis. Present here your…

c# regex web-scraping replace webclient
asked 7 years, 10 months ago Guilherme Lima 360
4
votes

1
answer

1266
views

Web Scraping Selenium + Python on JS-generated website = difficulty mapping elements

Good afternoon. I am developing a script that: accesses a system; within the environment, you will find certain information; generates a kind of report; creates a spreadsheet with the data. My…

python selenium web-scraping
asked 8 years, 9 months ago Bergo de Almeida 181
4
votes

1
answer

748
views

How to extract content from the Web (Web scraping) with C#?

I recently learned how to make web scraping and I got it on some sites, but others I can’t. I noticed that in some of the ones I can’t get there’s an "#", what that means? Let me give you an example…

c# web-scraping
asked 7 years, 9 months ago Diogo Sousa 433
4
votes

0
answers

214
views

Does anyone know how to make a Web Scraping on the SICONV (Free Access) website - With R?

I’m trying to extract the information from the site of siconv dealing with covenants in R:…

javascript r web-scraping rvest web-page
asked 7 years, 4 months ago Pablo Dias Vieira 81
3
votes

1
answer

231
views

File download from filling a form

I’m trying to access a site, fill out your form and download the file, but I’m encountering some difficulties. That’s my code so far: #library's require(rvest) #website url <-…

database r webforms download web-scraping
asked 9 years, 1 month ago Danilo Imbimbo 533
3
votes

1
answer

1215
views

Configure Firefox webdriver in Selenium

I’m using Selenium (Python) to fetch some data from a site, at a given time I access a link that downloads a file. How to configure the webdriver (Firefox) to automatically accept the download,…

python selenium selenium-webdriver web-scraping
asked 9 years ago Wellington Araujo Nogueira 41
3
votes

1
answer

82
views

How to ignore links that do not fit the established conditions and continue with scraping?

I would like to know how to ignore the links that do not fit the conditions set in title, data_hora and text; thus managing to continue scraping the site. Error that occurs when a link does not have…

r web-scraping
asked 9 years, 7 months ago Gabriel F 85
3
votes

1
answer

498
views

Error with scrapy requests

I have a csv file with some urls that need to be accessed. http://www.icarros.com.br/Audi, Audi http://www.icarros.com.br/Fiat, Fiat http://www.icarros.com.br/Chevrolet, Chevrolet I’ve got an Spider…

python python-3.x python-2.7 scrapy web-scraping
asked 9 years, 6 months ago Lucas Lopes 306
3
votes

1
answer

1345
views

How to keep only Dataframe-specific lines?

I have a code that enters a site, fills in a form and pulls a table, however, I want to delete some rows from this table that I don’t need. Let’s go to the code: #library's require(RCurl)…

r webforms table web-scraping
asked 9 years, 1 month ago Danilo Imbimbo 533
3
votes

1
answer

178
views

POST function of the httr package returns NA

I’m trying to make a script on R to make a POST on the site: http://tabnet.datasus.gov.br/cgi/tabcgi.exe?sinannet/cnv/violebr.def, but I am not succeeding. The goal is to extract the generated data…

r web-scraping httr
asked 9 years ago Rumenick Pereira da Silva 96
3
votes

1
answer

102
views

Navigate between pages from a web page bar

How to browse pages that are in a web page bar? Specific case: When performing a query on the TCM-Ba website, on the page that records the expenses of municipalities, it is possible to access some…

r web-scraping
asked 8 years, 4 months ago George Santiago 139
3
votes

1
answer

1331
views

A - Download data from the Hidroweb portal

The National Water Agency makes available in its portal Hydroweb the download of historical series referring to the data obtained by several monitoring stations. I would like to automate the…

r web-scraping
asked 8 years ago Renato 31
3
votes

1
answer

78
views

How to apply opacity to a DOM element - createImage(); - through a javascript editor?

I’m using P5.js - a javascript library - to capture images from a news API. I would like these images to be superimposed, but with opacity, so that the images merge. I’m not being able to apply…

javascript html html5 dom web-scraping
asked 7 years, 9 months ago Taruandé Biota 35
3
votes

1
answer

125
views

How to use the remote driver on proxy protected computer via R software Rselenium package?

Well, I need to access a site on my work network, but this is protected by proxy. Some sites accept using httr and rvest packages, others do not. To log in to site for examples I cannot. Example:…

r web-scraping proxy rvest
asked 7 years, 2 months ago Pablo Dias Vieira 81
3
votes

1
answer

71
views

Error in Webscraping process Youtube videos on R - NA' does not exist in Current Working directory

I am developing an academic work in which I should analyze the text of 25 selected videos on various Youtube channels. My advisor gave me a script about how he is developing this, so that I work on…

r web-scraping youtube
asked 6 years, 8 months ago Agnes Sofia Guimarães Cruz 31
3
votes

1
answer

155
views

Remove empty spaces Laravel + webscraping

I’m performing a webscraping as follows: $url = 'https://esaj.tjsp.jus.br/cpopg/show.do?processo.codigo=XXXXXXXX&processo.numero=XXXXXXX'; $client = new Client(); $crawler =…

php laravel web-scraping
asked 6 years, 7 months ago Betini O. Heleno 457
2
votes

1
answer

59
views

At what stage should the data be edited?

I am currently removing data from a website, with data in English, through web scraping. If we want, for example, to translate the names or values of the fields into Portuguese, or to complete…

web-scraping data-analysis
asked 9 years, 2 months ago Rui Lima 1,558
2
votes

2
answers

135
views

Webscrape Scoring for Welfare

I needed to extract the information from this site for an excel file, which Members vote in favor, against, abstentions, finally. It’s a webscrape exc, but as I understand html I’m having a hard…

html r web-scraping
asked 8 years, 12 months ago Danilo Imbimbo 533
2
votes

0
answers

58
views

Error in submitting form

Good afternoon, I have a code that works for some forms on the web and I’m trying to reuse it on this site: http://www.anbima.associados.rtm/titulos-publicos/estrutura-a-termo/tp-estrutura-termo.asp…

r web-scraping
asked 9 years ago Danilo Imbimbo 533
2
votes

2
answers

384
views

Download data from Stock Exchange tables in R

I have the following code, I need to download the data that is in the table, but the dataframe is always returning empty. library(tidyverse) library(rvest) library(bizdays) library(dplyr)…

r web-application web-scraping rvest
asked 6 years, 3 months ago Alexandre Sanches 1,223
2
votes

1
answer

77
views

How to extract text from a selected Beautifulsoap element?

I’m making a simple Crawler to get some news from the financial market. The code below is working properly, but would like to extract only the headline and then delete the html/CSS codes. import…

python web-scraping web-crawler beautifulsoup
asked 5 years, 4 months ago Edu Barros 59
2
votes

1
answer

170
views

Rselenium error - Selenium message:Java heap space

Hello, I’m trying to make a scraping of http://acervo.estadao.com.br/ using Rselenium, because the page only generates the information in html when it is loaded in the browser. Well, when it…

r selenium-webdriver web-scraping
asked 8 years, 6 months ago Denisson Silva 161
2
votes

0
answers

98
views

Scraping with R - xpathSApply returning a list of 0

I’m learning to read XML data in R. I wanted to extract the information of Brazilian football (championship name, game owner, result, etc.) from this site:…

r xml web-scraping scraping
asked 8 years, 5 months ago D Bertuzzi 21
2
votes

1
answer

259
views

How to handle errors during web scraping?

Hello, everyone. During the Web Scraping process, I started to come across some errors that occur during the request process. Currently, I have identified 4 types of frequent errors: Error in…

r web-scraping try-catch
asked 8 years, 3 months ago George Santiago 139
2
votes

1
answer

66
views

Use lambda expressions to sophisticate the parameters of a for in c#

good afternoon! I would like to ask a question, I am developing a collection code and at a certain time it is necessary a to iterate the values of the list and then save the information in an…

c# web-scraping
asked 7 years, 4 months ago Jonathan Igor Bockorny Pereira 111
2
votes

2
answers

580
views

How to collect data in web Crapping in Python?

Within of this URL, has several links , I have to take the links for the month of June 2017, download them and create a dataframe with all the files in one. But I stopped here at this part, how can…

python-3.x web-scraping beautifulsoup urllib
asked 6 years, 10 months ago Sarmento 41
2
votes

1
answer

149
views

Web Scraping on R

I have to download the table of this link: http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-taxas-referenciais-bmf-ptBR.asp I’m trying to use the package rvest, however, to no avail.…

r web-scraping rvest
asked 6 years, 4 months ago Alexandre Sanches 1,223
2
votes

1
answer

49
views

Web Scrapping Nodejs - I cannot use variable as parameter

I’m using the library Nightmare, for Web Scrapping, function works normally when I pass a string as parameter '.seletor', the problem occurs when I store this value in a variable and step as…

javascript node.js web-scraping
asked 6 years, 2 months ago Gabriel Ribeiro 495
2
votes

3
answers

516
views

How to compare two JSON objects with the same elements in Python

I have two Apis that bring me the data in JSON. I’m just not able to make the logic to compare the two. API 1: API 2: My logic is to compare the two on the die FlightID and if it’s the same, give me…

python json web-scraping python-requests
asked 6 years ago Dalmo Cabral 123
2
votes

0
answers

46
views

Selenium.common.exceptions.Nosuchelementexception: Message: Unable to locate element: //div[@class='classificacao_run points']//table

Hello, I made a python webscraping on mozzarella and it worked. However, when I tried to apply the code on another site, I’m getting the title error message. In practice: When running the program…

html5 python-3.x selenium-webdriver web-scraping
asked 5 years, 4 months ago DeSanterra 37
2
votes

2
answers

172
views

How to scrape Qlikview tables using Nodejs?

This website of the Brazilian government presents salary data to judges of various courts and tribunals. I would like to download all tables, but the data relating to the tables are not in the html…

node.js web-scraping puppeteer
asked 4 years, 8 months ago Lucas 3,858
1
votes

1
answer

989
views

Problem with VBA and Internet Explorer integration

I am trying to use VBA to collect data directly from the internet. I saw several examples of the use of the Internetexplorer Object, as below: Dim IE as Object Set IE = New InternetExplorer…

excel vba internet-explorer web-scraping
asked 10 years, 8 months ago Nícolas Pinto 113
1
votes

1
answer

287
views

How to Scrapping a page that has a javascript’s using python ?

I need to make Scrapping of a page, but the entry of the page has a button (apparently a Javascript) that gives access to all the content of the page itself. Using traditional libs(urllib2,…

javascript python web-crawler web-scraping scraping
asked 9 years ago Wellington Araujo Nogueira 41
1
votes

1
answer

761
views

Creating a program to get important news on a website

from bs4 import BeautifulSoup import requests url = 'http://g1.com.br/' header = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) ' 'AppleWebKit/537.36 (KHTML, like Gecko) '…

python web-scraping
asked 9 years, 8 months ago Ed S 2,057
1
votes

0
answers

521
views

Automate website browsing for software testing activities

I’m on a web systems development project, accessed by the browser. We are constantly making modifications in the operation of the processes and at each specific period we perform a test on the…

java c# web-crawler web-scraping html-agility-pack
asked 9 years, 8 months ago DanOver 1,334
1
votes

1
answer

788
views

How to collect text when there is no HTML reference class - Crawler Python

I have the following situation below: I want to collect "Text to Crawler" that is below, as I will navigate there without class or id? <td>Texto para crawler</td>…

python-3.x web-crawler web-scraping scraping
asked 8 years, 11 months ago DaniloAlbergardi 347
1
votes

1
answer

962
views

Web Scraping - convert HTML table to python Dict

I’m trying to turn an HTML table into dict@python, I came across some problems and I ask for your help. Go as far as I can go... def impl12(url='http://www.geonames.org/countries/', tmout=2): import…

python python-3.x web-scraping scrapy scraping
asked 8 years, 8 months ago britodfbr 688
1
votes

1
answer

264
views

Extract data from a calendar with Python and Beautifulsoup (under Linux Ubuntu-like)

Friends, I’d like to take data from a calendar: http://www.purebhakti.com/component/panjika The first step would be to make the program choose the time zone ( -3:00 Buenos Aires) and click on Submit…

python web-scraping
asked 9 years, 1 month ago Ed S 2,057
1
votes

1
answer

2667
views

How to avoid Max retries exceeded error in scraping in Python?

In Python 3 I made a program to scrape table lines from a public website with several pages (97893). And I create a list with the rows of each column and put a sleep to try to prevent scraping from…

python web-scraping beautifulsoup
asked 8 years, 2 months ago Reinaldo Chaves 333
1
votes

3
answers

691
views

Python 3.6 regular expression for inteitra phrase extraction

I need to extract only the phrases that contain ADMINISTRATION - JUDGE OUTSIDE - NOCTURNE - SISU - GROUP B, for example. That is, I need to get only the name of the course, the city, the turn, the…

python python-2.7 scrapy web-scraping
asked 9 years, 1 month ago SasukeUchiha 75