-3
I have a question about a code I made using scrapy to collect data and send to a json file.
The problem is that the file formatting is not as it usually is, so I found it strange, I’m in doubt if there is a problem or not.
Below is the code and the contents of the file :
[
{"uf": "AL", "area": "C\u00edvel", "juiz": "Henrique Gomes de Barros Teixeira\n", "partes": [{"nome": "Maria Edite dos Santos", "tipo": "Autora", "Advogado(s)": [{"nome": "Defensoria P\u00fablica do Estado de Alagoas", "tipo": "Defensor P"}]}, {"nome": "Hipercard Banco Multiplo S/A", "tipo": "R\u00e9u", "Advogado(s)": [{"nome": "Raoni Souza Drummond", "tipo": "Advogado"}, {"nome": "Eduardo Fraga", "tipo": "Advogado"}, {"nome": "Andrea Freire Tynan", "tipo": "Advogado"}]}, {"nome": "W. dos S. F.", "tipo": "Testemunha"}, {"nome": "P. V. R. de L.", "tipo": "Testemunha"}]}
]
CODE:
import scrapy
class TjalSpdrSpider(scrapy.Spider):
name = 'tjal'
allowed_domains = ['www2.tjal.jus.br/cpopg/']
# url_path = www2.tjal.jus.br/cpopg/open.do
start_urls = [
'https://www2.tjal.jus.br/cpopg/show.do?processo.codigo=01000I1FT0000&processo.foro=1&processo.'
'numero=0731425-82.2014.8.02.0001&uuidCaptcha=sajcaptcha_2976d855423340b4be91a23ff5add85d'
]
def parse(self, response):
table_partes = response.xpath('//table[@id="tableTodasPartes"]/tr[@class="fundoClaro"]')
area = ''.join(response.xpath('//table[@class="secaoFormBody"]/tr[4]/td[2]/table/tr/td/text()').getall())
juiz = response.xpath('//table[@class="secaoFormBody"]/tr[10]/td/span/text()').get()
partes = []
for dados in table_partes:
tipo = dados.xpath('./td/span/text()').get().strip()[:-1]
tipo_adv = dados.xpath('./td[2]/span[@class="mensagemExibindo"]/text()').get()
nome = dados.xpath('./td[2]/text()').get().strip()
advg = [{'nome': f'{adv}'.strip(),'tipo': f'{tipo_adv}'.strip()[:-1]}
for adv in dados.xpath('./td[2]/text()[preceding-sibling::span]').getall() if adv.strip() != '']
if nome != '':
if tipo != 'Testemunha':
partes.append({
'nome': nome,
'tipo': tipo,
'Advogado(s)': advg
})
else:
partes.append({
'nome': nome,
'tipo': tipo,
})
yield {
'uf': 'AL',
'area': area.strip(),
'juiz': juiz,
'partes': partes
}
The point is that I have to throw the dice by Yield running Spider using 'scrapy Crawl <Spider> -o <filename>. json', and in this file the content is going all in one line. I tried to release a Yield with json.dumps as you said but the following error: ERROR: Spider must Return request, item, or None, got 'str' in
– grutoogdrjgodpf
If json is well formatted, I see no problem being on a single line. There are some errors being generated?
– Paulo Marques
No, it was just that doubt. As recently I started studying web scraping normally qnd I tried to pull the data to a json file it came different, but if no problem all right, thank you!!
– grutoogdrjgodpf