0
I intend to remove all "Municipios" from the tag starting on this page. https://www.anmp.pt/anmp/pro/mun1/mun101w3.php?cod=M2200
And then remove information such as: "name of the council", "mayor", etc. from the pages of each county on the list.
Using the shell with you line by line extract all the information. Problem is when to get the Spider working.
#Spider.py
# -*- coding: utf-8 -*-
import scrapy
import urlparse
from scrapy.http import FormRequest
from scrapy.loader import ItemLoader
from municipios.items import Municipio
import time
class GetmunSpider(scrapy.Spider):
name = 'getMun'
allowed_domains = ['anmp.pt']
start_urls = ['https://www.anmp.pt/anmp/pro/mun1/mun101w3.php?cod=M2200']
def municipio_attr(self, response):
municipios_url = response.xpath('//select/option/@value').extract()
for municipio in municipios_url:
full_url = ['https://www.anmp.pt/anmp/pro/mun1/{0}'.format(i) for i in municipios_url]
yield FormRequest.from_response(str(full_url), callback=self.parse)
def parse(self, response):
#dados do municipio
municipio = ItemLoader(item = Municipio(), response = response)
municipio.add_xpath('nome', '//div[@class="sel3"]/text()'.extract())
municipio.add_xpath('pres_camara', '//div[@class="f3"]/text()')[3].extract().split(",")[0]
municipio.add_xpath('pres_assembleia', '//div[@class="f3"]/text()')[4].extract().split(",")[0]
municipio.add_xpath('contacto', '//div[@class="sel2"]/text()').extract()
municipio.add_value('endereco', " ".join(contacto)[:-42])
municipio.add_value('telefone', contacto[2])
municipio.add_value('fax', contacto[3])
return Municipio.load_item()
the field contact was only created to be able to remove the phone and fax from the same html tag
#Item.py
import scrapy
from scrapy.item import Item, Field
class Municipio(scrapy.Item):
nome = scrapy.Field()
pres_camara = scrapy.Field()
pres_assembleia = scrapy.Field()
endereco = scrapy.Field()
telefone = scrapy.Field()
fax = scrapy.Field()
Although Scrapy does not present any error at the end, I can only pass the table with the various fields, which still appear in the wrong order.
I have searched the Scrapy documentation and various websites and tutorials but without success.
Can anyone point me in the right direction? Thank you! Carlos
Hello Carlos. Welcome to SOPT. Please ask your question in Portuguese here. For questions in English, post on [SO]. I recommend that you make the [tour] of our site to clarify any doubts.
– Diego Rafael Souza
ok. Thanks when I asked the question I thought I was posting on the site in English.
– Carlos Aboim