Suggested improvement of code, for opening of several csv files in different directories

Asked

Viewed 74 times

1

Good evening friends, a friend passed me a code that he was developing so that I could study, however I would like to see a better alternative to the opening of various files . csv, only they’re in different folders.

The code is huge, I think something better is possible with the.

I have basic knowledge in Python.

  • Directories are divided into 4 folders, L1 - L2 - L3 - L4 names

  • inside each of these 4 folders, have some folders with date format. Ex: 03_02_20

  • inside these folders with the dates are the.csv files

  • Each file has a different code in name format, a 500 number and something. EX: L1_503_03_02_20.csv L1_505_03_02_20.csv ....
    L1_508_03_02_20.csv

Follow the original code below:

# Importação dos CSVs
df = pd.DataFrame()
df1 = pd.DataFrame()
df2 = pd.DataFrame()
df3 = pd.DataFrame()
df4 = pd.DataFrame()
df5 = pd.DataFrame()

for DateCSV in AnalysisDate:
    if Line == 'L4':
        df3 = pd.read_csv(
            DiretorioBase + "\\" + DateCSV + "\\" + Line + "_504_" + DateCSV + ".csv", sep=";")
        df3['workcenter'] = 3
        df1 = pd.read_csv(
            DiretorioBase + "\\" + DateCSV + "\\" + Line + "_502_" + DateCSV + ".csv", sep=";")
        df1['workcenter'] = 1
        df2 = pd.read_csv(
            DiretorioBase + "\\" + DateCSV + "\\" + Line + "_503_" + DateCSV + ".csv", sep=";")
        df2['workcenter'] = 2
        df5 = pd.read_csv(
            DiretorioBase + "\\" + DateCSV + "\\" + Line + "_506_" + DateCSV + ".csv", sep=";")
        df5['workcenter'] = 5
        df4 = pd.read_csv(
            DiretorioBase + "\\" + DateCSV + "\\" + Line + "_505_" + DateCSV + ".csv", sep=";")
        df4['workcenter'] = 4
    else:
        df1 = pd.read_csv(DiretorioBase + "\\" + DateCSV + "\\" + Line + "_505_" + DateCSV + ".csv", sep=";")
        df1['workcenter'] = 1
        df2 = pd.read_csv(DiretorioBase + "\\" + DateCSV + "\\" + Line + "_506_" + DateCSV + ".csv", sep=";")
        df2['workcenter'] = 2
        if not(Line == 'L2' and Mes == 2):
            df3 = pd.read_csv(DiretorioBase + "\\" + DateCSV + "\\" + Line + "_503_" + DateCSV + ".csv", sep=";")
            df3['workcenter'] = 3
        df4 = pd.read_csv(DiretorioBase + "\\" + DateCSV + "\\" + Line + "_507_" + DateCSV + ".csv", sep=";")
        df4['workcenter'] = 4
        df5 = pd.read_csv(DiretorioBase + "\\" + DateCSV + "\\" + Line + "_508_" + DateCSV + ".csv", sep=";")
        df5['workcenter'] = 5
    df = df.append(df1).append(df2).append(df3).append(df4).append(df5)

df['Line'] = Line
df = df.drop_duplicates(keep='first')

1 answer

0

Hello, welcome to Stack Overflow.

I don’t know if I understood your problem well, but I did bring a solution:

import os
import pandas as pd


def getListOfFiles(dirName):
    listOfFile = os.listdir(dirName)
    allFiles = list()
    for entry in listOfFile:
        fullPath = os.path.join(dirName, entry)
        if os.path.isdir(fullPath):
            allFiles = allFiles + getListOfFiles(fullPath)
        else:
            allFiles.append(fullPath)

    return allFiles


diretorio = r"C:\seu\diretório\base\nesta\variável"
files = getListOfFiles(diretorio)

df = pd.DataFrame()
for key, file in enumerate(files):
    if file.endswith(".csv"):
        df_x = pd.read(file, sep=";")
        df_x['workcenter'] = key
        df.append(df_x)

What this solution does is:

  1. The function reads all folders and subfolders in a directory and filters only the files, bringing them into a list.
  2. Takes each file from the returned list and in case the file ends with the extension .csv, it reads and saves on a temporary dataframe df_x.
  3. At the end, it simply adds this temporary dataframe to a final dataframe df.

I do not know if it was clear the solution, nor if it meets your expectations.

If not, comment here and we will evaluate until we get the best solution for you.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.