Parse file . txt using Pandas from external rules in a JSON

Asked

Viewed 100 times

0

I have a data set in . txt format that has its own formatting with the rules described by a separate JSON file.

Is there any direct way to tell Pandas to use this JSON as the basis to decode . txt?

This is a piece of the file . json - it has several objects of this type, but with different things in the value of each key.

[
    {
      "codigo": "V0101",
      "inicio": 1,
      "tamanho": 4,
      "descricao": "Ano de referência",
      "rotulo": "ano",
      "valores": "str"
    },
    {
      "codigo": "UF",
      "inicio":5,
      "tamanho":2,
      "descricao": "Unidade da Federação",
      "rotulo": "UF",
      "valores": {"11": "Rondônia", "12": "Acre", "13": "Amazonas", "14": "Roraima", "15": "Pará", "16": "Amapá", "17": "Tocantins", "21": "Maranhão", "22": "Piauí", "23": "Ceará", "24": "Rio Grande do Norte", "25": "Paraíba", "26": "Pernambuco", "27": "Alagoas", "28": "Sergipe", "29": "Bahia", "31": "Minas Gerais", "32": "Espírito Santo", "33": "Rio de Janeiro", "35": "São Paulo", "41": "Paraná", "42": "Santa Catarina", "43": "Rio Grande do Sul", "50": "Mato Grosso do Sul", "51": "Mato Grosso", "52": "Goiás", "53": "Distrito Federal"}
    },

This is a print of part of file . txt - lines are not regular : enter image description here

I’d really appreciate it if you could help!

1 answer

0

def extrai_txt(arquivo, layout):
    for linha in arquivo:
        yield {c['codigo']: linha[c['inicio']-1:c['inicio']-1+c['tamanho']]
            for c in layout}

Form of use:

with open('seu_arquivo.txt') as f:
    for reg in extrai_txt(f, seu_json):
        print(reg)

The result is several dicts:

{'ano': '2015', 'uf': '11', ...}
{'ano': '2016', 'uf': '13', ...}

Browser other questions tagged

You are not signed in. Login or sign up in order to post.