Turn row into column

Asked

Viewed 467 times

2

Good night, I’m having a problem of putting together more than 500 csv files, where I need a very important data that is in a row (A8), but it needs to turn column (like this in the second image), I’m trying to use the function iloc, but it’s not working, to join all the spreadsheets I am using this code, but before I need the dataframe to be like in the second photo, since I thank you for the help of all and for the suggestions:

path = r'C:/PJS_PY/HMR_DTS/DTS_BSE'

all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, sep=';', skiprows=range(0,8))
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

inserir a descrição da imagem aqui

inserir a descrição da imagem aqui

  • I would suggest changing the title of the question slightly since, as it stands, it suggests that your problem is much more general than it really is. By the way, if someone comes here to know how to transpose a pandas dataframe, just do df.T

1 answer

2


A possible solution is to open the file again and select only the information you want. With this information, you create a new column.

Since you didn’t present a replicable example, I created a dummy database that replicates your problem. So, suppose we have a csv file with that face:

k,19090
med,35000
gui, 23199
jkjd, 98082
jksjkjs, 890001
jksjkjs, 890001
gui, 23199
medidor, 18900005
name, idade, cor, sexo, income
Maria, 29, white, woman, 3000
João, 32, black, man, 2500

An implementation of this solution that I suggested would be:

import pandas as pd
import re

all_files= ["data.csv"]
li=[]

for filename in all_files:
        df = pd.read_csv(filename, sep=';', skiprows=range(0,8))
        f = open(filename, "r")
        for index, line in enumerate(f.readlines()):
            # pega especificamente a informação da linha 8:
            if index==7:
                # Como você quer só os números, 
                # aplique um regex para pegar só o código numérico
                numbers = re.findall(r"[\d+]", line)
                medidor_number = ''.join(numbers)
                medidor_var = [medidor_number]*len(df)
                df = pd.DataFrame({"Medidor": medidor_var}).join(df)
                #no seu caso, que tem vários arquivos,
                #faça o apensamento apenas aqui
                #li.append(df)
        f.close()

print(df)

Ouput:

    Medidor name, idade, cor, sexo, income
0  18900005  Maria, 29, white, woman, 3000
1  18900005     João, 32, black, man, 2500
  • Good afternoon! Wonder Luke, that’s right but in reality I would need to get all the information contained after "Meter (Serial): ", what would be the change in the script for this?

  • removes the regex. Strip re.findall(r"[\d+]", line) and puts line

Browser other questions tagged

You are not signed in. Login or sign up in order to post.