How to generate several Zips dynamically in memory, with Python?

Asked

Viewed 27 times

-2

I am reading a Python dataframe, and for each row of it, I need to save the column that contains an XML in a file, in memory, to later zip these files. So far so good. The point is that I need to generate these Zips dynamically, according to a certain key present within the files. If the key is the same as the previous one, you should save the XML to the same directory. Otherwise, you should generate a different zip for that new key. All files with the same key must be together. That is, if there are 20 different keys, I need to generate 20 different Zips.

Below, follow the code to date:

.
.
.
arrayKeys = []
memoryZip = BytesIO()
memoryXml = StringIO()
today = datetime.today()
todaysYear = str(today.year)
todaysMonth = str(today.month)
todaysDay = str(today.day)
todaysHour = str(today.hour)
todaysMinute = str(today.minute)
todaysSecond = str(today.second)

for line in range(len(dataFrame)):
.
.
.
    # Busca a chave:
    key = str(dataFrame['KEY'][line])
    directory = 'key='+key

    # Se a nova chave já existir na lista, deve gravar os dados no mesmo zip:
    if key in arrayKeys:            
        with zipfile.ZipFile(memoryZip, 'w', compression=zipfile.ZIP_DEFLATED, compresslevel=9) as zf:            

            # Cria a estrutura de pastas:        
            path = 'folder1\\folder2\\'+directory+'\\xml'+line+'.xml'

            # Salva os arquivos na estrutura necessária
            dataFrame.iloc[line,:].to_csv(memoryXml, header=False, index=False, escapechar='\\', quoting=csv.QUOTE_NONE)            
            zf.writestr(path, memoryXml.getvalue())

            # Nome do zip:
            zipName = key+'_'+todaysYear+todaysMonth+todaysDay+todaysHour+todaysMinute+todaysSecond

    # Se não, deve gravar em outro zip 
    else:
        arrayKey.append(directory)

        # Gerar outro zip dinamicamente, para cada chave
        with zipfile.ZipFile(memoryZip, 'w', compression=zipfile.ZIP_DEFLATED, compresslevel=9) as newZf:

            # Cria a estrutura de pastas em memória:        
            path = 'diretorio\\'+key+'\\xml'+line+'.xml'

            # Salva os arquivos na estrutura necessária
            dataFrame.iloc[line,:].to_csv(memoryXml, header=False, index=False, escapechar='\\', quoting=csv.QUOTE_NONE)            
            newZf.writestr(path, memoryXml.getvalue())

            # Nome do zip:
            zipName = key+'_'+todaysYear+todaysMonth+todaysDay+todaysHour+todaysMinute+todaysSecond

# Função que faz o upload dos zips (a implementação dela é irrelevante, nesse caso)
# Aqui, preciso que sejam enviados todos os zips em memória
upload('storage', memoryZip.getvalue(), f'{zipName}.zip')
  • The problem is described in the title of the question.

  • What is the generated error message?

  • Timeout. I don’t know if it’s because he goes through all the tags of all the Xmls.

  • It would not be more interesting to create a RAM Disk, save these zips within that RAM Disk and pass to the function the reference of that RAM Disk with the saved ZIP files?

1 answer

0

Fixed problem: basically, it was necessary to open only one Zipfile (instead of 2, as shown above) in memory, and go saving the files according to the key. After the loop, the memory pointer of the memoryXml variable was set to 0 (memory.Seek(0)) and then truncated (memoryXml.truncate(0)).

Browser other questions tagged

You are not signed in. Login or sign up in order to post.