3
I have a csv file with some urls that need to be accessed.
http://www.icarros.com.br/Audi, Audi
http://www.icarros.com.br/Fiat, Fiat
http://www.icarros.com.br/Chevrolet, Chevrolet
I’ve got an Spider to make all the requisitions.
import scrapy
import csv
from scrapy.selector import Selector
class ModelSpider(scrapy.Spider):
name = "config_brands"
start_urls = [
'http://www.icarros.com/'
]
def parse(self, response):
file = open("files/brands.csv")
reader = csv.reader(file)
for line in reader:
yield scrapy.Request(line[0], self.success_connect, self.error_connect)
def success_connect(self, response):
self.log('Entrei na url: %s' %response.url)
def error_connect(self, response):
self.log('Nao foi possivel %s' %response.url)
When I try to run Spider it cannot connect to any of the urls and if I enter the same url in the browser it can access normally. And my errback function doesn’t work either.
Debug:
2016-09-09 10:17:00 [scrapy] DEBUG: Crawled (200) <GET http://www.icarros.com.br/principal/index.jsp> (referer: None)
2016-09-09 10:17:00 [scrapy] DEBUG: Retrying <<BOUND METHOD MODELSPIDER.ERROR_CONNECT OF <MODELSPIDER 'CONFIG_BRANDS' AT 0X7F7D18B45990>> http://www.icarros.com.br/Audi> (failed 1 times): 400 Bad Request
2016-09-09 10:17:07 [scrapy] DEBUG: Retrying <<BOUND METHOD MODELSPIDER.ERROR_CONNECT OF <MODELSPIDER 'CONFIG_BRANDS' AT 0X7F7D18B45990>> http://www.icarros.com.br/Audi> (failed 2 times): 400 Bad Request
2016-09-09 10:17:14 [scrapy] DEBUG: Gave up retrying <<BOUND METHOD MODELSPIDER.ERROR_CONNECT OF <MODELSPIDER 'CONFIG_BRANDS' AT 0X7F7D18B45990>> http://www.icarros.com.br/Audi> (failed 3 times): 400 Bad Request
2016-09-09 10:17:14 [scrapy] DEBUG: Crawled (400) <<BOUND METHOD MODELSPIDER.ERROR_CONNECT OF <MODELSPIDER 'CONFIG_BRANDS' AT 0X7F7D18B45990>> http://www.icarros.com.br/Audi> (referer: http://www.icarros.com.br/principal/index.jsp)