Most voted "scrapy" questions
Scrapy is a web framework to extract data needed from websites (Web scraping). It is open-source, written in Python and controlled using command line tools.
Learn more…60 questions
Sort by count of
-
7
votes1
answer73
viewsLimiting the number of regex Matches with Python
I’m having a little trouble, I’d like to create a for in the Python to return a specific amount of match of regex. The way I did, he’s returning all the links that exist and that meet the defined…
-
6
votes1
answer705
viewsHow to extract information from an HTTP header with Python?
We know that in the HTTP protocol, the end of the header is indicated by "\r\n\r\n". Example: It may be that, for some reason, the customer does not send the "\r\n\r\n" to the server (could be an…
-
6
votes1
answer2565
viewsExtract information from lattes
Introducing Since 1999, Brazilian researchers have had a website where they can post information about their academic career. This information is known as Currículos Lattes. I wish to download a few…
-
4
votes2
answers126
viewsHow to calculate an optimal value for Scrapyd’s CONCURENT_REQUESTS variable?
One of the default settings in Scrapyd is the number of concurrent processes (is 16). CONCURRENT_REQUESTS = 16 What would be the best methodology to calculate an optimal value for this variable? The…
-
3
votes1
answer152
viewsMultiple Pipelines to Treat Different Files in Scrapy
How to treat pipelines.py when we have different Piders? Example: I have a Spider that works by getting posts from a particular blog and another by saving images from jpeg banners found on each…
-
3
votes1
answer126
viewsHow to manage the operation and failure in the execution of Spiders?
I’m developing a module to get information about the Piders that run on the company’s system. Below is the model where we keep the beginning of operations and the job. I would like to validate if…
-
3
votes4
answers1398
viewsTool to generate Xpath
Hello, I’m making a Spider to capture with Xpath some web data. But the creation of xpath is a bit laborious. Does anyone know any way to train Xpath? Example; I click 5 times on a link and some…
-
3
votes1
answer55
viewsIs there any way to disable Scrapyd’s web interface?
Is there any way to disable the Scrapyd web interface? I would like to monitor the server only by the api.
-
3
votes1
answer309
viewsInformation contained in two Scrapy pages
I’m not a python programmer, but I’m trying to work with the Scrapy application. The above example is what I need, this runs in extension of Chrome. To explain, I need the post and all available…
-
3
votes1
answer498
viewsError with scrapy requests
I have a csv file with some urls that need to be accessed. http://www.icarros.com.br/Audi, Audi http://www.icarros.com.br/Fiat, Fiat http://www.icarros.com.br/Chevrolet, Chevrolet I’ve got an Spider…
-
3
votes1
answer221
viewsScrapy cannot select a form using xpath
Hello, I am using the scrapy to make a Crawler to get to pick up questions of concuros and etc from the site gabarite.com.br, I can get the description of the question the correct alternative, but I…
-
2
votes1
answer138
viewsDeploy queues to manage competition between Piders in Scrapyd
Is there any way that Scrapyd can create Piders queues so that when I send many Piders (with different functions) I can privilege/limit the competition between them? Today, all the Spiders I send…
-
2
votes2
answers113
viewsHow to protect my Scrapyd server from unauthorized calls?
Let’s say I have the following configuration in scrapy.cfg in Scrapyd. [deploy] url = http://example.com/api/scrapyd/ username = user password = secret project = projectX In the Scrapyd…
-
2
votes1
answer37
viewsScrapy 1.0 - Log Settings
I need to know how to change the highlighted fields, because when I run my program with Scrapy in version 1.0 it prints the result in these highlighted quantities. I wanted to know how to change…
-
2
votes1
answer75
viewsProblems with restrict_xpaths parameter in a Crawler
I have no Python experience, but I decided to try to do something with Scrapy for testing. So I’m trying to collect the existing articles on a particular page, namely a DIV element with an ID…
-
2
votes1
answer86
viewsScrapy queueQueue and mysql store
I’ve grouped 2 questions because I think they’re related. I made a script test, where saved in the database the links saved with your data. This is a bad practice? (High priority) Do I need to…
-
2
votes1
answer126
viewsWebdriver error in Python3.5 Attributeerror: can’t set attribute
I need to download the contents of a website. I made a code in python 3.5. When I turn it only to a single page the code works very well but when I put it in a loop or function it gives error. The…
-
1
votes1
answer98
viewsCreate a new function with scrapy
I’m starting to learn scrapy and created the following function: import scrapy class ModelSpider(scrapy.Spider): name = "model" start_urls = [ 'http://www.icarros.com/' ] def parse(self, response):…
-
1
votes1
answer962
viewsWeb Scraping - convert HTML table to python Dict
I’m trying to turn an HTML table into dict@python, I came across some problems and I ask for your help. Go as far as I can go... def impl12(url='http://www.geonames.org/countries/', tmout=2): import…
-
1
votes1
answer146
viewsScrapy different pages
But I am facing a problem. And I ended up getting confused, I decided to return the code in a functional point. # -*- coding: utf-8 -*- # coding: utf-8 import scrapy from mbu2.items import Mbu2Item2…
scrapyasked 8 years, 5 months ago Luiz Brz Developer 163 -
1
votes1
answer114
viewsWeb Crawler with Django’s view.py
I am making a simple web Crawler, using Django 2.0, I want to capture only the "title" class of the news and then render "Return render" to a simple html, below my view.py. I am currently using…
-
1
votes1
answer456
viewspass input when running bat
Good afternoon, I have one. bat to run a python file, it needs to receive an entry, but I’m not able to run this input while running . bat, does anyone know how to? follows information from the bat:…
-
1
votes2
answers280
viewsI need help on a python Crawler
from scrapy.spiders import BaseSpider from scrapy.selector import HtmlXPathSelector from crawler.items import crawlerlistItem class MySpider(BaseSpider): name = "epoca" allowed_domains =…
-
1
votes3
answers691
viewsPython 3.6 regular expression for inteitra phrase extraction
I need to extract only the phrases that contain ADMINISTRATION - JUDGE OUTSIDE - NOCTURNE - SISU - GROUP B, for example. That is, I need to get only the name of the course, the city, the turn, the…
-
1
votes2
answers354
viewsScrapy xpath href or span inside the div
Hello, I’m trying to do a scraping where I have to pick up a link and text but I’m struggling because of page variations. I have three possible variations: 1. <div> <strong> <span…
-
1
votes1
answer332
viewsScratch parameters of a post method, with scrapy in python!
I need to collect information from a website using Spiders within Scrapy in Python, but the site is a method post and I’m learning the language while developing the project. I found a model of post…
-
1
votes2
answers444
viewsVirtual Assistant
Good afternoon fellows, as you go? So I’m fairly new to this development environment and I programmed in python some time ago. I have a project idea where it involves creating a kind of virtual…
-
1
votes0
answers32
viewsscrapay+xpath returning empty array
I’m learning how to create a Crawler with scrapy + xpath. However, when I give the command scrapy shell https://br.udacity.com/courses/all/ The system returns this as if everything is normal:…
-
1
votes0
answers32
viewsScrapy - Search for items in form
I am very beginner in the subject, can you help me? I am testing Spiders to seek bids. But I can’t return the items through the form. I have the code below example: import scrapy from scrapy.http…
-
0
votes1
answer430
viewsHow do I integrate my Django project with Scrapy?
I’m looking to develop a simple project using Django where I will create a web page and this page will capture data from other pages. The problem is that I cannot integrate the Scrapy with Django.…
-
0
votes1
answer414
viewsScrapy Web Data Extraction
Up.. Next guys... I’m writing code with scrapy framework to search and extract some data. I’m new at this! The code to follow theoretically would have to search and extract, but it does not extract…
-
0
votes1
answer241
viewsTwisted Critical unhandled error in scrapy tutorial
I’m new in programming and I’m trying to perform the scrapy tutorial http://doc.scrapy.org/en/latest/intro/tutorial.html Use python 2.7 and windows 7. When I run cmd the command "scrapy Crawl dmoz"…
-
0
votes1
answer509
viewsWhat is the best way to scrape the Datasus website in Python?
The link is this: http://tabnet.datasus.gov.br/cgi/tabcgi.exe?sih/cnv/nrbr.def I’m trying to send a POST through requests with a dictionary containing the categories I want, but then the URL remains…
-
0
votes0
answers24
viewsVariable xpath saved in Mongo
I have tried it in many ways, but I believe I am missing the point. I would like to save XPATH on Mongodb and import it into item() of Function parse. Is there any way to accomplish ?
-
0
votes1
answer53
viewsParse Xpath from Int
I have a scrapy running the for to bring the day and link to something. Ex: t_day = div.xpath('.//a/text()').extract_first() a_day = div.xpath('.//a/@href').extract_first() day = int(t_day) if day…
-
0
votes1
answer458
viewsSave Excel file to Python via Scrapy
As I do for mine Spider save all Excel data in a single XML file links which I extract? Or also save in each single XLS file in the project folder? Part of my Spider: def parse(self, response): divs…
-
0
votes0
answers80
viewsExtract pdf documents from scrapy sites
It is possible to scan an entire site by going through all links in search of scrapy pdf files? would be something like apache nutch. I did a search but the staff only uses Xpath, and Xpath can not…
-
0
votes0
answers68
viewsHow to fill a Textbox and scrape data with Python?
I’m trying to analyze some data of the secretary of education. I’ve already made the requisition, but... I found a way to do this through the web even, however, there are many schools for this…
-
0
votes1
answer45
views"Missing Scheme" error using Scrapy
When I run my spider scrapy returns me the following error: Valueerror: Missing Scheme in request url h import scrapy class QuotesSpider(scrapy.Spider): name = "Mineracao" def start_requests(self):…
-
0
votes1
answer142
viewsHow to create an Array within the other
I need to create an array that has Indice and values page_links receives the links of a page all_links_main = [] for link in page_links: all_links_main.append(link.get('href')) produto = [] for…
-
0
votes0
answers58
viewsHow to extract data for Models.py fields from Scrapy?
I intend to remove all "Municipios" from the tag starting on this page. https://www.anmp.pt/anmp/pro/mun1/mun101w3.php?cod=M2200 And then remove information such as: "name of the council", "mayor",…
-
0
votes1
answer733
viewsScrapy for login
I took this code from the internet and changed it a little, to log in to the cpfl site, but when I use the command scrapt crawl myproject nothing happens and the command scrapy runspider items.py of…
-
0
votes2
answers468
viewsConcatenation of multiple lists with Python
Good afternoon! I have a problem and need help, I am working with 3 distinct lists that should be added to a dictionary, but so I can capture all values without one overwriting the other, I need to…
-
0
votes1
answer116
viewsCapture using Xpath
I’m making a capture of a site using python (scrapy) and xpath How to capture only 232,990 of the code below? <div class="price-advantages-container"> <div class="price-comparison">…
-
0
votes1
answer485
viewsHow can I use Scrapy in Anaconda
Hi, I’m having trouble creating a project with Scrapy. I’m studying data science in college and I have to use Scrapy. I’m using Anaconda. First through the Spider IDE (Anaconda Navigator), now I’m…
-
0
votes1
answer91
viewsAdjust columns csv with Scrapy
I’m having a problem, python by default when it generates the csv file separates the columns by comma, but I need the created items to turn into the respective columns, but I’m not able to do the…
-
0
votes1
answer270
viewsPass URL list to Scrapy function
I have a Python API that takes two arguments (URL and a user-defined word) and provides in JSON file how many times the specified word appears in the URL. However, I would like to pass a URL list. I…
-
0
votes1
answer400
viewsFix Encoding Problem while exporting to csv from a scrapy file
How can I fix encoding problem while saving file in csv? this problem is happening only when saved in csv. from scrapy import * from projeto_iruan.items import * import csv class…
-
0
votes2
answers1265
viewsHow to convert CSV to XLSX with python?
How I Convert a File .csv generated by python to .xlsx? I’m in two trouble: One of them is that I couldn’t figure out how to make this conversion The second is that even passing the command crawl…
-
0
votes0
answers54
viewsProblem collecting website information
I am trying to collect the data number of people helps in SOPT, ie my impact, to put in an api later, but is not extracting the information. Spider: import scrapy class StackOverflow(scrapy.Spider):…