How to extract information from a 'cnab' file using python?

Question

How to extract information from a 'cnab' file using python?

Asked 5 years, 6 months ago

Viewed 603 times

0

I need to extract the information from a cnab file (which is a text file), be able to read the file using python.read but I don’t know how to progress.

Contents of the file:

39900000         2957742120001329999999999          00009900000000000990GRUPO NEXXERA                 HSBC                          NEXXERA   107062017095759000000102001600                                                                    
39900011C2001010 2957742120001329999999999          00009900000000000990GRUPO NEXXERA                                                         Rua Madalena Barbi            181                 Centro-Florianopolis88015190SC                  
3990001300001A00000039900001900000000001090EMPRESA FORNECEDOR 1          0000000001          07062017BRL000000000000000000000000000010                    00000000000000000000000                                                    0          
3990001300002A00070039900002900000000002090EMPRESA FORNECEDOR 2          0000000002          07062017BRL000000000000000000000000020020                    00000000000000000000000                                                    0          
3990001300003A00001839900003900000000003090EMPRESA FORNECEDOR 3          0000000003          07062017BRL000000000000000000000003030030                    00000000000000000000000                                                    0          
39900015         000005000000000003050060000000000000000000000000                                                                                                                                                                               
39999999         000001000007000000

code:

 def abre_aquivo():

    conteudo = open("modelo_arquivo.txt", 'r', encoding="utf-8")

    conteudo_formatado = conteudo.read()
    # print(type(conteudo_formatado))
    x = conteudo_formatado.split()
    print(x)

    conteudo.close()

abre_aquivo()

Expected report:

------------------------------------------------------------------------------------------------------------------------------------------------------------
Nome da Empresa | Numero de Inscricao da Empresa | Nome do Banco | Nome da Rua        | Numero do Local | Nome da Cidade       | CEP       | Sigla do Estado
------------------------------------------------------------------------------------------------------------------------------------------------------------
EMPRESA XX      | 00.000.000/0000-00             | XXX           | Rua Madalena Barbi | 181             | Centro-Florianopolis | 00000-000 | SC
------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------
Nome do Favorecido   | Data de Pagamento | Valor do Pagamento | Numero do Documento Atribuido pela Empresa | Forma de Lancamento
--------------------------------------------------------------------------------------------------------------------------------------
EMPRESA FORNECEDOR 1 | 07/06/2017        | R$ 0,10            | 0000000001                                 | Credito em Conta Corrente
EMPRESA FORNECEDOR 2 | 07/06/2017        | R$ 200,20          | 0000000002                                 | Credito em Conta Corrente
EMPRESA FORNECEDOR 3 | 07/06/2017        | R$ 30.300,30       | 0000000003                                 | Credito em Conta Corrente
--------------------------------------------------------------------------------------------------------------------------------------

Would have some chunk of code for analysis or logs/errors to be checked as well as the file 'cnab'?

– Bart

2019/07/26 at 05:10
1

I put the contents of the file and the code I have in the question.

– lucazpinheiro

2019/07/26 at 13:29
What information do you need to extract? What is the format of this file? Are columns or something like that?

– Woss

2019/07/26 at 17:05
The FEBRABAN website has these patterns set, you came to read?

– fernandosavio

2019/07/26 at 17:53
Yes, I read the documentation of the site, but it did not help the issue of extraction.

– lucazpinheiro

2019/07/26 at 18:08
I didn’t understand how it didn’t help... It is in these files that specify the structure of the files .cnab. For example: the first 3 digits are the bank’s FEBRABAN code, which in your case is 399, which is HSBC. What’s stopping you from extracting the information? Not knowing the file structure or not knowing how to do it in Python?

– fernandosavio

2019/07/26 at 18:54
It is the file structure that remains confused, parse in python I know quiet. By the way I need to read more of the documentation.

– lucazpinheiro

2019/07/26 at 19:06

Show 2 more comments

1 answer

Browser other questions tagged python filing-cabinet

You are not signed in. Login or sign up in order to post.

by jsbueno • **30,668** points · Answer 1 · 2019-07-27T16:34:29+00:00

It’s a file type with fixed field sizes - you read a line, and use Python’s slice syntax to extract the value from each field -

For example, for the line:

39900000         2957742120001329999999999          00009900000000000990GRUPO NEXXERA                 HSBC                          NEXXERA   107062017095759000000102001600

If this row is in the variable "Row" - we can see, counting on it, or, preferably in the file specification, that the limits of the fields are in the columns

17, 52, 72, 102, 132, 142

update, March 2020

When I wrote the original answer, I must have had a short time, and I sent the code of a good alternative to this, but without explaining what I was doing (code below, maintained from the original answer).

However it is important understand what is being done. The simplest way to access these values is, of course, to use Python’s "slice" syntax to pick up substrings. So, if you take the data lines, knowing the above indexes, just do, in normal Python code, within any function:


campo1 = linha[0:17]
campo2 = linha[17:52]
campo3 = linha[52:72]
...

Note that Python’s design decision of ranges being closed at the beginning and open at the end (that is, include the first element, but stop before the last element of a slice), helps MUCH - the field1 goes up immediately before of the character in heading 17)

Now, to do so, mainly as we’re going to have several different layouts, it gets weird - and having to pass these variables to a dictionary, or create an object afterwards, would be repetitive - but it wouldn’t be wrong - would be straightforward and easy to understand code.

The approach I used uses advanced Python mechanisms, which allow you to create a "specialized class to function as attributes of another object" (this class type is called "Descriptor") - and it can use code to generate these attributes, rather than being data. And then, I write code just to take the data at the specific positions of the string containing the line - exactly as above, but automatically triggered by Python when someone wants to access meuobjeto.campo1. The code for what is the original answer

end of update

The ideal is to create a family of classes - that is, a "base" class and a daughter class representing each type of object that will be in the file, so that you can specify the fields and their delimiters in a very readable way in these daughter classes, and in the base class put a method that, given the record, extracts the information from the correct locations.

You can even do with "Descriptors" and quardas the data as raw text inside Python, and the code __get__ Developer is already in the right field.

Of course there you will have nested objects - the rows of indices 2, 3 and 4 of the example are transactions, and the best you do is to create a separate class for information about them (and in the parent object, put a "list of transactions" as attribute. ) The logic for this gets a little cooler, but I have no way to answer.

So just making the first line:



class Campo:
    def __init__(self, inicio, final):
        self.inicio = inicio
        self.final = final

    def __set_name__(self, owner, nome):
        self.nome = nome

    def __get__(self, instance, owner):
        if not instance:
            return self
        return instance.dados_brutos[self.inicio: self.final]

class Base:
    def __init__(self, dados):
        self.dados_brutos = dados

    def __repr__(self):
        campos = []
        for name, obj in self.__class__.__dict__.items():
            if isinstance(obj, Campo):
                campos.append((name, getattr(self, name)))
        return "\n".join(f"{campo}:{conteudo}" for campo, conteudo in campos)

class CNAB_LINHA1(Base):
    campo1 = Campo(0, 17)
    campo2 = Campo(17, 52)
    campo3 = Campo(72, 102)
    campo4 = Campo(102, 132)
    campo5 = Campo(123, 142)
    campo6 = Campo(142, None)

And at the terminal:

In [59]: a = """39900000         2957742120001329999999999          00009900000000000990GRUPO NEXXERA                 HSBC         
    ...:                  NEXXERA   107062017095759000000102001600"""                                                              

In [60]: b = CNAB_LINHA1(a)                                                                                                        

In [61]: print(b)                                                                                                                  
campo1:39900000         
campo2:2957742120001329999999999          
campo3:GRUPO NEXXERA                 
campo4:HSBC                          
campo5:         NEXXERA   
campo6:107062017095759000000102001600