Dataframe row and column organization

Asked

Viewed 100 times

0

Hello,

I have the following situation:

import pandas as pd
import numpy as np


l=[]

l.append(('Mod1',0,70))
l.append(('Mod1',1,88))
l.append(('Mod1',2,97))
l.append(('Mod2',0,44))
l.append(('Mod2',1,93))
l.append(('Mod2',2,100))
l.append(('Mod3',0,99))
l.append(('Mod3',1,71))
l.append(('Mod3',2,33))

I would like the Dataframe to be as follows:

   Mod1    Mod2    Mod3
0   70      44      99
1   88      93      71
2   97     100      33

where 0,1,2 would be the index but the way I did Mod1,Mod2,Mod3 are in different columns

the way I did:

df = pd.DataFrame(data=l)
df=df.transpose()

gets that way:

inserir a descrição da imagem aqui

  • I don’t understand what’s wrong

  • Hello, I edited the question to show how it’s getting the way I did.

2 answers

1


The code below has been tested with the , and using the function pivot_table [Pandas-Docs] allows you to convert the dataframe for the desired format:

df.columns=['col1', 'col2', 'col3'] #adicionados títulos às diferentes colunas
df = df.pivot_table(index=['col2'], columns='col1', values='col3').reset_index()
df.drop(df.columns[0], axis=1, inplace=True)
df = df.rename_axis(None, axis=1)
df

Out[11]: Mod1 Mod2 Mod3 0 70 44 99 1 88 93 71 2 97 100 33

1

Third Edit: I believe that the code can now process the data in the expected way, that is, with automation of the generation of the lists and with correct inclusion of each "Mod", regardless of the nomenclature of each string. As for the problem of irregularity of the amount of data per "Mod", I used the feature to include the data 0 in each empty field (check if this brings harm to the data processing in your case).

Room Edit: There was an error in the logic of this last posted version, I will resend the version that I believe is working properly. To better organize the answer, I will replace the previous versions with the code below:

import pandas as pd

dados = {}
l = []
lista_mods = []
lista_valores = []
maximo = 0

l.append(('Mod1',0,70))
l.append(('Mod1',1,88))
l.append(('Mod1',2,97))
l.append(('Mod1',3,44)) # linha incluída para testar o funcionamento com diferentes quantidades de índices
l.append(('Mod2',0,44))
l.append(('Mod2',1,93))
l.append(('Mod2',2,100))
l.append(('Mod3',0,99))
l.append(('Mod3',1,71))
l.append(('Mod3',2,33))

for pos, c in enumerate(l):
    if l[pos][0] not in lista_mods:
        lista_mods.append(l[pos][0])

for pos, c in enumerate(lista_mods):
    for n in range(0, len(l)):
        if l[n][0] == c:
            lista_valores.append([])
            lista_valores[pos].append(l[n][2])

for c in lista_valores:
    if len(c) > maximo:
        maximo = len(c)

for pos, c in enumerate(lista_valores):
    while True:
        if len(c) < maximo:
            lista_valores[pos].append(0)
        else:
            break    

dados = dict(zip(lista_mods, lista_valores))
df = pd.DataFrame(dados)
print(df)
df.to_html('temp.html')
  • Hello, very good solution, I’m also at the beginning of learning with pandas.. But to shorten the problem I put only from Mod1 to Mod3, but actually I have up to Mod98 in sequence (Mod1..Mod98), and in this case it is a little complicated to create 98 lists.

  • I understood. But is it always in this pattern? Each modN always has three respective data?

  • The names change, Mod1, Modulo30, Md15... n has a name pattern, and is up to 30 data of each of them.

  • I was even imagining a way to automate this creation of the lists, but as there is no specific pattern (of nomenclature and amount of data per "Mod"), I can’t think of a simple output. I believe that the correct path is by the functions of the pandas library even, if I find something I will add here.

  • I changed the code according to the problems presented, see if this way the data processing is correct. Again, considering my limitations in relation to the pandas library, I tried to resolve the issues that arose by the native Python functions and methods, it is very likely that there is a better solution by the specific library features.

  • I made a new correction because the logic of the previous version was wrong.

Show 1 more comment

Browser other questions tagged

You are not signed in. Login or sign up in order to post.