Iterate through multiple Xmls files with Python

Question

Iterate through multiple Xmls files with Python

Asked 6 years, 5 months ago

Viewed 766 times

4

Hello, everybody.

Imagine the following situation:

I have a directory with several Xmls Nfe files and need to move them by issuer (CNPJ) and/or issue date to another directory.

I’ve been able to iterate through the elements of a file, just, and copy it from one directory to another, if it’s in the stipulated condition, but I don’t know how to do it with all the files. If anyone can help, I’d appreciate it!

One of the analyzed Xmls has this structure:

<?xml version="1.0" encoding="UTF-8" ?> 
 - <nfeProc versao="4.00" xmlns="http://www.portalfiscal.inf.br/nfe">
  - <NFe xmlns="http://www.portalfiscal.inf.br/nfe">
   - <infNFe Id="NFe16190119250367000191550030000618321111024017" versao="4.00">
    - <emit>
      <CNPJ>19250367000191</CNPJ> 
      <xNome>M B PRODUTOS ALIMENTICIOS LTDA</xNome>

Follow the code I made to read a file, only.

import shutil
import xml.etree.ElementTree as ET

tree = ET.parse(r'/Users/Jp/Desktop/teste.xml')
root = tree.getroot()
NFe = root.find('{http://www.portalfiscal.inf.br/nfe}NFe')
infNFe = NFe.find('{http://www.portalfiscal.inf.br/nfe}infNFe')
emit = infNFe.find('{http://www.portalfiscal.inf.br/nfe}emit')
cnpj = emit.find('{http://www.portalfiscal.inf.br/nfe}CNPJ')

if cnpj.text == '19250367000191':
  shutil.copy(r'/Users/Jp/Desktop/teste.xml', r'/Users/Jp/Desktop/Hiper')
  print(cnpj.text)
else:
  print('Não tem o CNPJ solicitado')

I use version 3.7 of Python!

In this new schema, I need to copy files that have the attribute "AR_CENTRAL_PG1_V1"!

Follow the schema:

 - <XMLDataFile>
   - <Groups fileName="092449274_000059131.12097869.NFe16141204842563000420551000000591311003460046.SERVIDOR250.NDDigitaleFormsConnectorService7.xml" pjlHeaderData="" pjlFooterData="">
     - <Formulary>
       - <XMLHead>
        - <Form name="AR_HIPER_PG1_V1">
           <PrinterName>HP LaserJet P1505</PrinterName> 
           <RawData /> 
           <InsertInCold>1</InsertInCold> 
           <IsDanfe>1</IsDanfe> 
           <DocumentUser>FormsUser</DocumentUser> 
           <DocumentTitle>No Name</DocumentTitle> 
           <pjlHeaderData /> 
           <pjlFooterData /> 
           <AutomaticFields /> 
          </Form>
         </XMLHead>

2 answers

Browser other questions tagged python xml

You are not signed in. Login or sign up in order to post.

by Lacobus • **13,510** points · Answer 1 · 2019-02-18T14:44:37+00:00

First, you can implement a function capable of extracting the CNPJ from the sender from the file .XML containing the data of an Electronic Invoice.

The example below makes use of the module xml.etree.ElementTree, that has support to XPath, greatly simplifying the work of Parsing of XML:

import xml.etree.ElementTree as ET

def obter_cnpj_emissor( arquivoNFe ):
    nsNFe = { "ns" : "http://www.portalfiscal.inf.br/nfe" }
    root = ET.parse( arquivoNFe ).getroot()
    node = root.findall( "./ns:NFe/ns:infNFe/ns:emit/ns:CNPJ", nsNFe )
    if( len(node) != 1 ):
        return None
    return node[0].text

Then taking advantage of the module OS, write a function capable of returning a list with the name of all files with the extension .XML contained in a form directory nonrecursive:

import os

def obter_arquivos_xml( diretorio ):
    ret = []
    for arq in os.listdir( diretorio ):
        if arq.endswith(".xml"):
            ret.append( os.path.join( diretorio, arq ) )
    return ret

Or simply:

import os

def obter_arquivos_xml( diretorio ):
    return [ os.path.join( diretorio, arq ) for arq in os.listdir( diretorio ) if arq.endswith(".xml") ]

And finally, write a function capable of verifying the CNPJ of each invoice issuer, copying to a destination directory only the files .XML which satisfy the condition:

import shutil

def copiar_arquivos( origem, destino, cnpj ):
    for arquivo in obter_arquivos_xml( origem ):
        if( cnpj == obter_cnpj_emissor( arquivo ) ):
            shutil.copy( arquivo, destino )

Putting it all together:

import shutil
import os
import xml.etree.ElementTree as ET

def obter_cnpj_emissor( arquivoNFe ):
    nsNFe = { "ns" : "http://www.portalfiscal.inf.br/nfe" }
    root = ET.parse( arquivoNFe ).getroot()
    node = root.findall( "./ns:NFe/ns:infNFe/ns:emit/ns:CNPJ", nsNFe )
    if( len(node) != 1 ):
        return None
    return node[0].text

def obter_arquivos_xml( diretorio ):
    return [ os.path.join( diretorio, arq ) for arq in os.listdir( diretorio ) if arq.endswith(".xml") ]

def copiar_arquivos( origem, destino, cnpj ):
    for arquivo in obter_arquivos_xml( origem ):
        if( cnpj == obter_cnpj_emissor( arquivo ) ):
            print( "CNPJ '{}' encontrado no arquivo '{}'...".format(cnpj,arquivo) )
            print( "Copiando arquivo '{}' para diretorio '{}'".format(arquivo,destino) )
            shutil.copy( arquivo, destino )
        else:
            print("CNPJ '{}' NAO ENCONTRADO no arquivo '{}'.".format(cnpj,arquivo))

copiar_arquivos( "/Users/Jp/Desktop/", "/Users/Jp/Desktop/Hiper", "19250367000191" )

EDIT:

You can use an expression XPath to determine whether a file .XML has a certain tag.

The function check_form_name() returns the amount of times the tag <FORM> appears in the file .XML with the attribute name containing the value you specify:

import xml.etree.ElementTree as ET

def check_form_name( arquivoNFe, nomeForm ):
    root = ET.parse( arquivoNFe ).getroot()
    node = root.findall( ".//Form[@name='{}']".format(nomeForm) )
    return len(node)

In the case below, the number of times the tag will be printed in the standard output <FORM> appears in the file schema.xml with the attribute name set to the value AR_HIPER_PG1_V1:

print(check_form_name("schema.xml","AR_HIPER_PG1_V1"))

by Sidon • **6,563** points · Answer 2 · 2019-02-18T14:07:45+00:00

Iterating on the structure of a path in your file system:

import os
path='/home/sidon/etc' 
for root, dirs, files in os.walk(path):
    for name in files:
        print(name)

In the example above you are browsing in a Generator, that is, at the end of the iteration the object will be "empty", if you want to store in a variable in memory (What can be highly expensive), you can do, for example:

tupla1 = tuple(os.walk('/home/sidon'))

the scandir.

In python 3 vc tb can do:

path='/home/sidon/etc'

for entry in os.scandir(path):
    if entry.is_file():
        print('Arquivo: ' + entry.path)
    elif entry.is_dir():
        print('Pasta: '+entry.path)

Output:

Arquivo: /home/sidon/etc/SHA256SUM.asc
Pasta: /home/sidon/etc/share
Pasta: /home/sidon/etc/mautic
Arquivo: /home/sidon/etc/QfrZL7R-debian-wallpaper.jpg
Arquivo: /home/sidon/etc/README.md
Arquivo: /home/sidon/etc/go1.11.2.linux-amd64.tar.gz

.......

Edict; To further elucidate the use of walk edited to add one more example, to try to make it clearer, we exchange comments with the author of the question:

I created a directory according to the image below:

Now see the code and the output below, so it’s just Voce adapt to your context

for dir_name, subdirs, files in os.walk(path):
    '''
    Note que o bloco abaixo poderia esta montando o path completo 
    de cada arquivo e realizando qq operação com eles.
    ''' 
    print('Diretorio: %s' % dir_name)
        for name in files:
        print('\t%s' % name)

Output from above code:

Diretorio: /home/sidon/teste/
        teste1.txt
Diretorio: /home/sidon/teste/subteste
        texto4.txt
        texto2.txt
        texto3.txt
Diretorio: /home/sidon/teste/subteste2
        txt-sub2.txt