Merge CSV with Python

Asked

Viewed 480 times

7

I have a collection of dozens of CSV files. Most of them share the same fields, but some have unique fields. I want to merge them, using Python into a single CSV file with a global cabling that includes all fields of all columns. I am using the CSV library, but so far unsuccessful, because the data does not end at the right place.

  • 3

    Could show part of the code you save the data?

1 answer

8


I had a similar problem some time ago. I adjusted it a little to your needs. You may have to change a few things, namely at the level of delimiter.

from glob import glob
import csv

"""
    este programa tem de ser excutado da directoria onde estão os csv.
    o output vai para o ficheiro consolidated.csv
"""
def create_global_header(files):
    """
        criar os cabeçalhos com todos os headers dos csv.
    """
    consolidated_header = ['filename']
    for file in files:
        with open(file, 'r') as icsv:
            reader = csv.DictReader(icsv, dialect = 'excel', delimiter=';')
            for field in reader.fieldnames:
                if field not in consolidated_header:
                    consolidated_header.append(field)
    return consolidated_header

def global_csv(ifile, global_header, ofile):
    """
    le o ficheiro csv ifile, e bota para o ficheiro ofile.
    uma vez que o DictWriter e DictReader sao usados, e o cabeçalho
    é comum aos dois ficheiros, os dados sabem para que campo devem ir.
"""
    with open(ofile, 'a' ) as ocsv, open(ifile, 'r') as icsv:
        ireader = csv.DictReader(icsv, dialect='excel', delimiter=';' )
        owriter = csv.DictWriter(ocsv, global_header, dialect='excel', delimiter=';')
        for i, row in enumerate(ireader):
            row['filename']= ifile
            owriter.writerow(row)


if __name__ == '__main__':
    files = glob('*.csv')
    global_header = create_global_header(files)
    with open("consolidated.csv", 'w') as mycsv:
        writer = csv.DictWriter(mycsv, global_header, dialect='excel', delimiter=';')
        writer.writeheader()
    for file in files:
        if file != 'consolidated.csv':
            global_csv(file, global_header, 'consolidated.csv')

Browser other questions tagged

You are not signed in. Login or sign up in order to post.