Python -> how to merge multiple csv files

Asked

Viewed 2,425 times

1

I have 4 folders, and each of them is filled with csvs of 3 types (ap,peers,visits).

I’m a beginner in python, but I wanted to create a python script that would merge the files that are peer, in order to get 1 single file with the lines of all peer files found. In addition, I wanted to add a column to the header called "student", and for each line I wrote in the final peers file I would put the respective student at the end.

mainfolder = sys.argv
mainfolder.pop(0)
mainfolder = mainfolder[0]
allfolders = glob.glob(mainfolder + '*\\')

with open(mainfolder + "finalpeers\\totalpeers.csv", "w") as finalPeersFile:

    newpheader = '"_id","ssid","bssid","dateTime","latitude","longitude","student"\\n'
    finalPeersFile.write(newpheader)

    for folder in allfolders:
        student = folder.split('\\')[-2]
        filesTomerge = glob.glob(folder + '*.csv')

        for filename in filesTomerge:
            if (isPeers(filename)):
                with open(filename, 'r') as p:
                    for line in p:
                        finalPeersFile.write(line)

My code even does that, but since the headers are the same and there are files that only have headers, I get lots of lines with repeated headers. Also I can’t just take the header of the first line and add "student" because there is a "hidden" new line, I think it’s something particular from python. And although I have the student to add at the end of the line, I can’t just add it to a string (line + student).

Final file:

inserir a descrição da imagem aqui

How can I delete repeat or merge (merge) files so as not to put headers?

p.s.: Price sorry if you are asking a question that has already been asked (although I have searched a lot and none have helped me solve the problem).

1 answer

1


The new line hidden can be removed from a string through the method rstrip().

The header of input files can be ignored (skipped) by calling the method next().

Let’s see:

from os import listdir
from os.path import isfile, join

# Diretorio
diretorio="/tmp"

# Recupera lista de ficheiros CSV em um diretorio
ficheiros = [f for f in listdir(diretorio) if (isfile(join(diretorio, f)) and f.endswith('.csv')) ]

# Abre ficheiro de saida...
saida = open( "saida.csv", "a" )

# Para cada ficheiro...
for f in ficheiros:

    # Abra o ficheiro
    csv = open( f )

    # Ignora o header do CSV
    csv.next()

    # Calcula student...
    student = 1

    # Para cada uma das demais linhas no ficheiro...
    for linha in csv:
         linha = linha.rstrip() + ';' + str(student) + '\n';
         saida.write(linha)

    # Fecha ficheiro CSV de entrada
    csv.close()

# Fecha ficheiro CSV de saida
saida.close()
  • Thanks for the quick reply! I had to make some changes but it’s already working as wanted thanks to the indications. José

Browser other questions tagged

You are not signed in. Login or sign up in order to post.