Iterate through multiple Xmls files with Python

Asked

Viewed 766 times

4

Hello, everybody.

Imagine the following situation:

I have a directory with several Xmls Nfe files and need to move them by issuer (CNPJ) and/or issue date to another directory.

I’ve been able to iterate through the elements of a file, just, and copy it from one directory to another, if it’s in the stipulated condition, but I don’t know how to do it with all the files. If anyone can help, I’d appreciate it!

One of the analyzed Xmls has this structure:

<?xml version="1.0" encoding="UTF-8" ?> 
 - <nfeProc versao="4.00" xmlns="http://www.portalfiscal.inf.br/nfe">
  - <NFe xmlns="http://www.portalfiscal.inf.br/nfe">
   - <infNFe Id="NFe16190119250367000191550030000618321111024017" versao="4.00">
    - <emit>
      <CNPJ>19250367000191</CNPJ> 
      <xNome>M B PRODUTOS ALIMENTICIOS LTDA</xNome> 

Follow the code I made to read a file, only.

import shutil
import xml.etree.ElementTree as ET

tree = ET.parse(r'/Users/Jp/Desktop/teste.xml')
root = tree.getroot()
NFe = root.find('{http://www.portalfiscal.inf.br/nfe}NFe')
infNFe = NFe.find('{http://www.portalfiscal.inf.br/nfe}infNFe')
emit = infNFe.find('{http://www.portalfiscal.inf.br/nfe}emit')
cnpj = emit.find('{http://www.portalfiscal.inf.br/nfe}CNPJ')

if cnpj.text == '19250367000191':
  shutil.copy(r'/Users/Jp/Desktop/teste.xml', r'/Users/Jp/Desktop/Hiper')
  print(cnpj.text)
else:
  print('Não tem o CNPJ solicitado')

I use version 3.7 of Python!

In this new schema, I need to copy files that have the attribute "AR_CENTRAL_PG1_V1"!

Follow the schema:

 - <XMLDataFile>
   - <Groups fileName="092449274_000059131.12097869.NFe16141204842563000420551000000591311003460046.SERVIDOR250.NDDigitaleFormsConnectorService7.xml" pjlHeaderData="" pjlFooterData="">
     - <Formulary>
       - <XMLHead>
        - <Form name="AR_HIPER_PG1_V1">
           <PrinterName>HP LaserJet P1505</PrinterName> 
           <RawData /> 
           <InsertInCold>1</InsertInCold> 
           <IsDanfe>1</IsDanfe> 
           <DocumentUser>FormsUser</DocumentUser> 
           <DocumentTitle>No Name</DocumentTitle> 
           <pjlHeaderData /> 
           <pjlFooterData /> 
           <AutomaticFields /> 
          </Form>
         </XMLHead>

2 answers

1

First, you can implement a function capable of extracting the CNPJ from the sender from the file .XML containing the data of an Electronic Invoice.

The example below makes use of the module xml.etree.ElementTree, that has support to XPath, greatly simplifying the work of Parsing of XML:

import xml.etree.ElementTree as ET

def obter_cnpj_emissor( arquivoNFe ):
    nsNFe = { "ns" : "http://www.portalfiscal.inf.br/nfe" }
    root = ET.parse( arquivoNFe ).getroot()
    node = root.findall( "./ns:NFe/ns:infNFe/ns:emit/ns:CNPJ", nsNFe )
    if( len(node) != 1 ):
        return None
    return node[0].text

Then taking advantage of the module OS, write a function capable of returning a list with the name of all files with the extension .XML contained in a form directory nonrecursive:

import os

def obter_arquivos_xml( diretorio ):
    ret = []
    for arq in os.listdir( diretorio ):
        if arq.endswith(".xml"):
            ret.append( os.path.join( diretorio, arq ) )
    return ret

Or simply:

import os

def obter_arquivos_xml( diretorio ):
    return [ os.path.join( diretorio, arq ) for arq in os.listdir( diretorio ) if arq.endswith(".xml") ]

And finally, write a function capable of verifying the CNPJ of each invoice issuer, copying to a destination directory only the files .XML which satisfy the condition:

import shutil

def copiar_arquivos( origem, destino, cnpj ):
    for arquivo in obter_arquivos_xml( origem ):
        if( cnpj == obter_cnpj_emissor( arquivo ) ):
            shutil.copy( arquivo, destino )

Putting it all together:

import shutil
import os
import xml.etree.ElementTree as ET

def obter_cnpj_emissor( arquivoNFe ):
    nsNFe = { "ns" : "http://www.portalfiscal.inf.br/nfe" }
    root = ET.parse( arquivoNFe ).getroot()
    node = root.findall( "./ns:NFe/ns:infNFe/ns:emit/ns:CNPJ", nsNFe )
    if( len(node) != 1 ):
        return None
    return node[0].text

def obter_arquivos_xml( diretorio ):
    return [ os.path.join( diretorio, arq ) for arq in os.listdir( diretorio ) if arq.endswith(".xml") ]

def copiar_arquivos( origem, destino, cnpj ):
    for arquivo in obter_arquivos_xml( origem ):
        if( cnpj == obter_cnpj_emissor( arquivo ) ):
            print( "CNPJ '{}' encontrado no arquivo '{}'...".format(cnpj,arquivo) )
            print( "Copiando arquivo '{}' para diretorio '{}'".format(arquivo,destino) )
            shutil.copy( arquivo, destino )
        else:
            print("CNPJ '{}' NAO ENCONTRADO no arquivo '{}'.".format(cnpj,arquivo))

copiar_arquivos( "/Users/Jp/Desktop/", "/Users/Jp/Desktop/Hiper", "19250367000191" )

EDIT:

You can use an expression XPath to determine whether a file .XML has a certain tag.

The function check_form_name() returns the amount of times the tag <FORM> appears in the file .XML with the attribute name containing the value you specify:

import xml.etree.ElementTree as ET

def check_form_name( arquivoNFe, nomeForm ):
    root = ET.parse( arquivoNFe ).getroot()
    node = root.findall( ".//Form[@name='{}']".format(nomeForm) )
    return len(node)  

In the case below, the number of times the tag will be printed in the standard output <FORM> appears in the file schema.xml with the attribute name set to the value AR_HIPER_PG1_V1:

print(check_form_name("schema.xml","AR_HIPER_PG1_V1"))
  • Lacobus, good afternoon!

  • I’ll test your code, thank you for the force!

  • Hello, Lacobus. After testing the code, you are returning the following message: > line 9, in obten_cnpj_emissor Return Node[0]. text Indexerror: list index out of range What I did wrong?

  • This error indicates that CNPJ was not found in the input XML file. You could attach it in your question ?

  • I attached a part of XML!

  • @user139559: I added an error control if CNPJ cannot be extracted from the XML file properly. That should solve the problem you mentioned.

  • Good afternoon, Lacobus! Your code has solved much of the problem, thank you. But I have another xml schema that I can’t read with this code, could you help me? I’ll attach the new schema to the question!

  • @user139559 Elaborate a new question with the schema you quoted. Separating the two questions would be better.

Show 3 more comments

0

Iterating on the structure of a path in your file system:

import os
path='/home/sidon/etc' 
for root, dirs, files in os.walk(path):
    for name in files:
        print(name)

In the example above you are browsing in a Generator, that is, at the end of the iteration the object will be "empty", if you want to store in a variable in memory (What can be highly expensive), you can do, for example:

tupla1 = tuple(os.walk('/home/sidon')) 

the scandir.

In python 3 vc tb can do:

path='/home/sidon/etc'

for entry in os.scandir(path):
    if entry.is_file():
        print('Arquivo: ' + entry.path)
    elif entry.is_dir():
        print('Pasta: '+entry.path)

Output:

Arquivo: /home/sidon/etc/SHA256SUM.asc
Pasta: /home/sidon/etc/share
Pasta: /home/sidon/etc/mautic
Arquivo: /home/sidon/etc/QfrZL7R-debian-wallpaper.jpg
Arquivo: /home/sidon/etc/README.md
Arquivo: /home/sidon/etc/go1.11.2.linux-amd64.tar.gz

.......

Edict; To further elucidate the use of walk edited to add one more example, to try to make it clearer, we exchange comments with the author of the question:

I created a directory according to the image below:

inserir a descrição da imagem aqui

Now see the code and the output below, so it’s just Voce adapt to your context

for dir_name, subdirs, files in os.walk(path):
    '''
    Note que o bloco abaixo poderia esta montando o path completo 
    de cada arquivo e realizando qq operação com eles.
    ''' 
    print('Diretorio: %s' % dir_name)
        for name in files:
        print('\t%s' % name)

Output from above code:

Diretorio: /home/sidon/teste/
        teste1.txt
Diretorio: /home/sidon/teste/subteste
        texto4.txt
        texto2.txt
        texto3.txt
Diretorio: /home/sidon/teste/subteste2
        txt-sub2.txt
  • Sidon, thank you so much for the code. .

  • With os.walk vc can "browse" the files of a directory relatively simply, I am editing the answer, see if it helps.

  • Sidon, I expressed myself in the wrong way. I need to iterate through the lines of the aquives and move them according to the stipulated condition. I believe I have to make a loop, but not even how to start... I thank you for your help!!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.