Specific chunk break in JSON file with python

Asked

Viewed 407 times

0

Is it possible to break a line from a specific section of Json, transform it into an array, and then streamline it? Why do I ask this.. I am developing a file mining bot and came across a situation where some pages return only one file on this line and other pages on the same site may contain multiple information, so that my request is valid and I can extract the pdf, I need to make that division when there is such a case.

Return in the Json: inserir a descrição da imagem aqui

Multiple: inserir a descrição da imagem aqui

Excerpt from the code: Url tested: http://www.bcb.gov.br/pre/normativos/busca/normativo.asp?tipo=Circ&ano=2009&numero=003467

def parseHTML_JS(self, response):
    idBuscaAnexo = json.loads(response.body)['d']['results'][0]['ID']
    contente = json.loads(response.body)['d']['results'][0]['Texto']
    data = json.loads(response.body)['d']['results'][0]['Data1']
    categorias = response.meta['Categoria']
    descricao = response.meta['Description']
    titulo = response.meta['Titulo']
    pdfs = json.loads(response.body)['d']['results'][0]['DocumentosAnexados'][0:][:-5]
    url_pdfs = "http://www.bcb.gov.br/pre/normativos/busca/downloadNormativo.asp?arquivo=/Lists/Normativos/Attachments/"+str(idBuscaAnexo)+"/"+str(pdfs)
    req = Request(url=url_pdfs, callback=self.parsePdf)
    req.meta['Categoria'] = categorias
    req.meta['Description'] = descricao
    req.meta['Titulo'] = titulo
    req.meta['Content'] = contente
    req.meta['Data'] = data
    yield req
  • It is not clear what you need to do. In the second case, that more than one file arrives, you want to return the name of all of them or just one?

  • need return the following return, for example: position 0: u'Circ_3467_v1_o.pdf', position 1: u'Circ_3467_v2_l.pdf', position 2: u'Circ_3467_v2_p.pdf'

  • I was able to make a return that brings me an array, but it still contains the "Trash" that would be those numbers next to #, I can isolate # ; but I don’t know how to get the numbers out

  • if you have where I can show the whole code, tell me I’ll show you the whole code

  • Have the question itself, just [Edit] and add the code.

  • Stackoverflow does not accept very large codes

  • Then elaborate a [mcve]

Show 2 more comments

1 answer

1


In the string result use result = result.split('Circ_')[1], this will eliminate the beginning, after result = result.split('_')[0] , This will take the number. In the first part you use "Circ_" as a separator, and with that you take the second part. Which will be in the form of "Number Voutronumero_character". Store this second part and now give a split in the first half, using underline as separator, with this you take the number that is in the first part.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.