3
Hello, I am using the scrapy to make a Crawler to get to pick up questions of concuros and etc from the site gabarite.com.br, I can get the description of the question the correct alternative, but I can not take the alternatives every time I run in the terminal the return of the alternatives and null, I do not know how to proceed could help me?
# -*- coding: utf-8 -*-
import scrapy
import re
from enemGrabData.items import EnemgrabdataItem
class MultipleSpider(scrapy.Spider):
name = 'multiple'
allowed_domains = ['www.gabarite.com.br']
start_urls = ['http://www.gabarite.com.br/questoes-de-concursos/disciplina/1-portugues']
# html body div.site div#content
def parse(self, response):
conteudo = response.xpath('//*[@id="content_questoes_conteudo"]')
item = EnemgrabdataItem();
for questao in conteudo.xpath('ul'):
item['pergunta'] = questao.xpath('li[4]/text()').extract()
item['alternativa_A'] = questao.xpath('form/li/fieldset').extract()
# //*[@id="content_questoes_conteudo"]/ul[10]/form/li/fieldset
item['alternativa_B'] = questao.xpath('/form/li/fieldset/label[2]/text()').extract()
item['alternativa_C'] = questao.xpath('/form/li/fieldset/label[3]/text()').extract()
item['alternativa_D'] = questao.xpath('/form/li/fieldset/label[4]/text()').extract()
item['alternativa_E'] = questao.xpath('/form/li/fieldset/label[5]/text()').extract()
item['alternativaCorreta'] = questao.xpath('//form/button/@onclick').extract()
inputString = str(item['alternativaCorreta']).split("checaquestao",1)[1]
item['alternativaCorreta'] = str(inputString).split(',')[1]
# re.findall(r"\(u'(.*?)',\)", inputString)[0]
yield item
<ul>
<li class="numero"><h3>Questão 5104. <a href="questoes-de-concursos/disciplina/1-portugues">Português</a> - Nível Superior - Polícia Militar GO - UEG - 2013</h3></li>
<li><a style="cursor:pointer; margin-top:10px; display:inline-block;" onclick="onoff('teste5104')"><img src="imagem/texto-questao.gif" alt="Texto anexado à questão" width="10" height="10"> Texto anexado à questão</a></li>
<li class="texto" style="display:none;margin-top:5px;" id="teste5104"> <img width="680" height="754" alt="Violência no Brasil, outro olhar" src="http://www.gabarite.com.br/midia/questoes/689-texto.fw.png"></li>
<li class="pergunta">A expressão “tais como” (linha 3) tem, no texto, a função de introduzir uma
</li>
<form name="form-questoes">
<li class="questao">
<fieldset>
<span><input id="a5104" type="radio" name="radio5104" value="a"></span><label for="a5104">a) concessão</label>
<span><input id="b5104" type="radio" name="radio5104" value="b"></span><label for="b5104">b) exemplificação</label>
<span><input id="c5104" type="radio" name="radio5104" value="c"></span><label for="c5104">c) conclusão</label>
<span><input id="d5104" type="radio" name="radio5104" value="d"></span><label for="d5104">d) exceção</label>
</fieldset>
</li>
<button title="Gabarito com resposta certa." name="botao5104" class="btn" type="button" onclick="checaquestao(5104,'b','mult-escolha')">Resolver questão</button>
<span id="msg5104"></span>
</form>
<li class="comentario">
<ul>
<li><a rel="nofollow" title="Explique esta questão!" href="javascript:abre_janela('http://www.gabarite.com.br/questoes-de-concursos/questao/5104',500,600)">Comentar questão</a></li>
<li><a rel="nofollow" title="Comentários sobre a questão!" href="javascript:abre_janela('http://www.gabarite.com.br/questoes-de-concursos/questao/5104',500,600)">0 comentários</a></li>
<li><a rel="nofollow" title="Gabarito errado, está desatualizada, foi anulada pela banca, problemas na formatação, etc..." href="javascript:abre_janela('http://www.gabarite.com.br/questoes_aviso.asp?id_questao=5104',500,600)">Notificar erro</a></li>
</ul>
</li>
</ul>
As I could tell to agoritimo to go to the next page, on scrapy and done with two lines, I tried in this code that you passed me importing the libs but it did not work so far
– joao paulo santos almeida
I edited to include paging and passing parameters like, matter. I hope it helps. Anyway, you can also edit your code in scrapy and include the xpath queries that I used in mine. It should work perhaps with small adaptations.
– fsola
I got through your example to do using scrapy, thanks for the help, if you want to see how it turned out leave the link https://github.com/jpaulo789b/Crawler-Quest-es-Enem/blob/master/indo/spiders/spiderquestoes.py
– joao paulo santos almeida