concatenate python csv files

Asked

Viewed 392 times

-2

3 lists that form the directory and the file name, I want to concatenate the files "lists_tipos_anos.csv" (around 44 files) in "lists_tipos_all.csv" (generate 4 files), where the "all" is all the merge of the files from the list years:

  1. dfp_bpa_all.csv;
  2. dfp_bpp_all.csv;
  3. itr_bpa_all.csv;
  4. itr_bpp_all.csv.

follows a code that checks if the files exist: I only need the concatenation.

listas = ['dfp','itr']
tipos = ['bpa','bpp']
anos = range(2010, 2021)
path_csv = 'C:/Users/Saulo/Desktop/projeto analise/download/arquivos/dfs/%s'
arq_csv = '/%s_%s_%d.csv'

concatenar = []

for lts in listas:
    for ano in anos:
        for tps in tipos:
            url = (path_csv % lts + arq_csv % (lts,tps,ano))
            #print(url)
            if os.path.isfile(url):
                concatenar.append(url)
print(concatenar)
  • What is the content of listas_tipos_anos.csv?

  • content was already "treated", the separate files are equated in the structure, so I want to concatenate, only a file in csv distinguishing only listings

  • need to be in Python? is periodic? if not just copy cp a.csv b.csv c.csv final.csv

2 answers

1

If you really want to do it with Python, it would be something like

txt = ''
for arquivo in contatenar:
    with open(arquivo, 'r') as f:
        txt += f.read()

with open("arquivo_final.csv", "w") as saida:
    saida.write(txt)

I hope it helps.

0


Thank you Paulo Marques, follow the code that solved my problem, it was laborious but it was worth it

# Concatenar
import pandas as pd
import glob

listas = ['dfp','itr']
tipos = ['bpa','bpp']

path = '"DIRETÓRIO"/%s/%s'
path1 = '"DIRETÓRIO"'

for lts in listas:
    for tps in tipos:
        csv = path % (lts,tps)
        all_files = glob.glob(csv + '/*.csv')
        li = []
        for files in all_files:
            df = pd.read_csv(files, sep=';', encoding='iso-8859-1')
            li.append(df)
        frame = pd.concat(li,axis=0,ignore_index=True)
        print(frame)
        frame.to_csv(path1 + '/%s_%s_all.csv'% (lts,tps), sep=';', encoding='iso-8859-1',index=False)
  • The use of pandas was a great solution. Then see the difference of concat and of merge. Both methods are really useful! When using Python 3, try using fstring. Lines like csv = path % (lts,tps) could be replaced by csv = f'"DIRETÓRIO"/{lts}/{tps}', another thing: use and abuse of os.path.join. This method concatenates directories with filenames. Instead of path1 + '/%s_%s_all.csv'% (lts,tps) could use os.path.join(path1, f"{lts}_{tps}_all.csv")

  • thanks for the tips, I will test soon

Browser other questions tagged

You are not signed in. Login or sign up in order to post.