Most voted "scraping" questions
Scraping is the activity of extracting content from a web page for various purposes from the raw storage of information to the refinement and organization of page content.
Learn more…39 questions
Sort by count of
-
11
votes1
answer8418
viewsWhat does this anti-roboe code in Javascript do?
What this anti-roboe code in Javascript does? <html><head></head><body onload="challenge();"> <script> eval(function(p,a,c,k,e,r){e=function(c){return…
-
10
votes1
answer1417
viewsProtect automated access web pages
How can I protect my web pages so that they are not accessed in an automated way? By search bots Engines like Googlebot (I think the basic form was the metatag with noindex and nofollow). By…
-
10
votes1
answer500
viewsPick up news from a specific website
I am currently looking for a solution to implement a module in my App to pick up news from a specific website and show on the screen of the news module. Someone can show me a way?…
-
7
votes1
answer73
viewsLimiting the number of regex Matches with Python
I’m having a little trouble, I’d like to create a for in the Python to return a specific amount of match of regex. The way I did, he’s returning all the links that exist and that meet the defined…
-
6
votes2
answers2294
viewsScraping in Python - read pdf
I made a Scrapping in Python that takes a URL of any PDF, reads and returns, but in some Pdfs I’m having the problem of coming with some characters like this: ".\nO xc3 xb3rg xc3 xa3o tamb xc3 xa9m…
-
5
votes1
answer310
viewsProblems accessing a website via Rstudio
Problems when connecting there is a particular site via Rstudio url <- "https://www.jusbrasil.com.br/diarios/busca?q=%22licen%C3%A7a+sem+vencimentos%22&idtopico=T10001849&o=data" links…
-
4
votes2
answers695
viewsWeb scraping with pure Javascript
I want to make a web scraping that reads an XML page and takes a certain value that is in "name", but, I do not know exactly if it is possible - I only found in how to do with Nodejs -, is it…
-
3
votes1
answer100
viewsWhat do Threads share when making Http calls?
The problem: I can’t login to a site more than once on Threads different. My application is a Console application. If I open several executable, can each do successfully login in one Thread, only…
-
3
votes1
answer1193
viewsWeb scraping with Python (Selenium and Request)
Hello, I am trying to perform a web scraping on a page protected by login, I have already managed to access both via Request, and via Selenium, the problem is after login. The page is as follows:…
-
2
votes1
answer1067
viewsFind out python link
I’m making a Scrapping web in python and sometimes I come across some links and/or Buttons that are not with the real address of the url so you will be redirecting if you click. In this case, if I…
-
2
votes1
answer98
viewsHTML scrapping with pure Javascript
Hi, I have an html that has the following example sequence: <links class="canais-teste"> CANAL 1 <links/> There are some 80 excerpts of these, I wanted to only take the contents from…
-
2
votes0
answers98
viewsScraping with R - xpathSApply returning a list of 0
I’m learning to read XML data in R. I wanted to extract the information of Brazilian football (championship name, game owner, result, etc.) from this site:…
-
1
votes1
answer287
viewsHow to Scrapping a page that has a javascript’s using python ?
I need to make Scrapping of a page, but the entry of the page has a button (apparently a Javascript) that gives access to all the content of the page itself. Using traditional libs(urllib2,…
javascript python web-crawler web-scraping scrapingasked 7 years, 8 months ago Wellington Araujo Nogueira 41 -
1
votes1
answer788
viewsHow to collect text when there is no HTML reference class - Crawler Python
I have the following situation below: I want to collect "Text to Crawler" that is below, as I will navigate there without class or id? <td>Texto para crawler</td>…
-
1
votes1
answer962
viewsWeb Scraping - convert HTML table to python Dict
I’m trying to turn an HTML table into dict@python, I came across some problems and I ask for your help. Go as far as I can go... def impl12(url='http://www.geonames.org/countries/', tmout=2): import…
-
1
votes1
answer114
viewsWeb Crawler with Django’s view.py
I am making a simple web Crawler, using Django 2.0, I want to capture only the "title" class of the news and then render "Return render" to a simple html, below my view.py. I am currently using…
-
1
votes1
answer860
viewsBeautifulsoup - True href links
I was studying about Webscraping with Python and started using the bs4 bilioteca (Beautifulsoup). When I started picking up the tags a and the attribute href, I realized that I could not access the…
-
1
votes0
answers42
viewsIncomplete tweets in searchTwitter search method
Hello, community of Stackoverflow BR, I’m playing with the library twitteR of programming language R, and I realize that in the search that I have made, the texts are cut, example: "RT…
-
1
votes0
answers171
viewsError: read ECONNRESET and Error: connect ETIMEDOUT
Good night, you guys. I’m doing scraping (web scraping) from a website, I’m using the Nodejs and Xios. When I run the application it works perfectly bringing me the information I requested, but it…
-
1
votes0
answers310
viewshow to capture specific tokens in python requests
I need to capture specific cookies from a python code and pass them on later, the problem is, if I set cookies manually they work for a while but after a few days they expire. I can pick up a…
-
1
votes0
answers57
viewsScraping on instagram
Hello, I would like to ask for a help, I’m wanting to do a scraping on Instagram to be able to analyze personas and extract some data as tags most used by people who follow a certain someone, I took…
-
0
votes1
answer3475
viewsC# - How to make a simple Web Scraping
I want to read information from an HTML page of a online radio. I have tried to read using Htmlagilitypack, but without success because the page in question I am working does not use Elementid, I…
-
0
votes1
answer206
viewsScraping an External Html
I’m trying to get a text from another site using C# (Htmlagilitypack). I can find the div, but when I try to show the value on the screen, it shows the path of the function. I believe I’m forgetting…
-
0
votes1
answer270
viewsPass URL list to Scrapy function
I have a Python API that takes two arguments (URL and a user-defined word) and provides in JSON file how many times the specified word appears in the URL. However, I would like to pass a URL list. I…
-
0
votes1
answer94
viewsData scraping with jsoup and saving in txt
Whoa, way to go, guys. I’m trying to learn data scraping on my own, and as my English doesn’t help, I’m turning 30. Basically this is it. In executing my code, he lists the athletes of the…
-
0
votes1
answer501
views -
0
votes1
answer98
viewsWeb Scraping or Web Crawler isolate Node
Please, I’m trying to retrieve the following information: "value bra": (<span class="value bra">3,666</span>) <div class="ticker-financial-market" initiated="true"> <div…
-
0
votes0
answers36
viewsHow to update a key slice (key) of a Python dictionary?
How do I update just a slice of a key in a Python dictionary? I am scraping a page and would like to format the result so that my key is on the same line as my value, for example: Air Conditioners:…
-
0
votes0
answers61
viewsRemove all strings before keys via javascript
I have a css file and I would like to generate a JSON, for this, the first step I found would be to remove the entire selector before the brackets .meuSeletor { propriedade: valor; } for {…
-
0
votes1
answer341
viewsProblem navigating with Selenium (using Python) in search results presented in dynamic HTML
I am performing a scraping of articles of a newspaper from Pernambuco (Diário de PE) according to a search I did with some keywords on the subject of interest. The journal search returns 10 results…
-
0
votes0
answers10
viewsHow to extract a text within a <dd> using Jsoup?
Talk, you guys, baby? I’ve been trying for a couple of hours, and I’ve already researched everything that is a place to fix it, question is as follows: I need to know if an X text is like "OK" or…
-
-1
votes1
answer194
viewsScraping in Python. Mounting an Insert
I would like to extract this table using maybe Scraping in Python:…
-
-1
votes1
answer258
viewsScraping data from a website with dynamic filtering
The search platform of the programs conceptualized in Capes has a dynamic filtering for the query itself. I would like to know how I collect data from an output using Python. Why, using only bs4…
-
-1
votes1
answer156
viewsHow to find a value between two tags in an HTML text? Other than "XPATH"
I’m trying to extract the value between two HTML tags with Python, I need it between two tags same. I was doing it this way to extract values from a store catalog. But now I have a need to extract…
-
-1
votes2
answers133
viewsHow I search properties within a td / tr using only js
Good afternoon guys, I need to search for data from within a site (scraping) that are within a table, with and . The information I need is within td’s. Follow the code: <table width="95%"…
-
-1
votes1
answer35
viewsI need help, I am using Node and Request Form to do Scraping
I’m trying to develop an application to take data from a particular site and send it to the database. It would be simple if you didn’t have to request to access this data. As I’m beginning, I may be…
-
-1
votes1
answer32
viewsHow to Scrap a Table and insert the data into the database?
Well, basically what I need is to do a microservice that extracts the prices from the soybean quotation from this table: https://www.canalrural.com.br/cotacao/soja/. And then insert the data into a…
-
-2
votes0
answers19
viewsHow to make a scrapy post in python?
I am learning the lib Scrapy in Python and I am having difficulty to perform a request with the POST method for the url. I’m trying to use the following code: form={"letraLocalidade":"", "ufaux":"",…
-
-3
votes2
answers464
viewsHow to get HTML code from a protected page with Cloudflare?
I’m trying to get the HTML of a page with the Jsoup. This page has Cloudflare as protection and, instead of getting the HTML code of the site I’m interested in, it’s returning me the HTML of the…