Table grouping

Asked

Viewed 120 times

3

I have this table and would like to put together a list or another table

     users_ID_User                                           Material  Rating
0            A            "Hey Pogo!...from citation to fulltext"       5
1            B            "Hey Pogo!...from citation to fulltext"       4
2            C            "Hey Pogo!...from citation to fulltext"       5
3            D            "Hey Pogo!...from citation to fulltext"       4
4            D              "Julius Caesar" Speech Analysis StAIR       4
5            A              "Julius Caesar" Speech Analysis StAIR       4
6            B              "Julius Caesar" Speech Analysis StAIR       4
7            E              "Julius Caesar" Speech Analysis StAIR       4
8            E  "Virgin Birth" method promises ethical stem cells       5
9             A  "Virgin Birth" method promises ethical stem cells       3
10             D                   .NET Overview, VB.NET and C#.NET       5
11            C                   .NET Overview, VB.NET and C#.NET       4
12            D  1 Project Management and Software Engineering ...       4
13             A  1 Project Management and Software Engineering ...       4
14             S  1 Project Management and Software Engineering ...       4
15             B  1 Project Management and Software Engineering ...       4
16            A  1 Project Management and Software Engineering ...       4
17           C                       10 Best Foods 10 Worst Foods       1
18             D                       10 Best Foods 10 Worst Foods       2
19           E                       10 Best Foods 10 Worst Foods       4
20           E                           100 preguntas de química       0
21           F                           100 preguntas de química       0
22           G                           100 preguntas de química       0
23           A                           100 preguntas de química       2
24           B                           100 preguntas de química       0
25           C                           100 preguntas de química       4
26           E                           100 preguntas de química       5
27           A                           100 preguntas de química       4
28           F                           1000 Quick Writing Ideas       2
29           A                           1000 Quick Writing Ideas       0

I’m using pandas, I tried:

df_3 = pd.read_csv("tabela_final_1.csv",delimiter=",")
del df_3['Unnamed: 0']
df_3 = df_3.drop_duplicates(['users_ID_User','Material'])

df_3 = df_3.pivot('users_ID_User','Material','Rating')

didn’t work out the way I’d like, I’m trying to ride.

Usuario   Hey Pogo !...       Julius Caesar ...  Virgin Birth ... 
A                     5           4
B                     4           4
C                     6           NaN
D                     4           NaN
E                     NaN          5                    5

and in the end sort by the line that has the most the rating...

in the case shown would be first A, B, E ......

I would like a logic to put together this table, I’m a little confused!

2 answers

3


A simple strategy is to create a count column of ratings greater than 0, sort the table using this new column as a criterion, and then dispense with it.

For this, the code below solves

import pandas as pd

# do exemplo do autor:
# df = pd.read_csv("tabela_final_1.csv")
# del df['Unnamed: 0']
# df = df.drop_duplicates(['users_ID_User','Material'])
# df = df.pivot('users_ID_User','Material','Rating')

# Criando a coluna de quantidade de avaliações:
# - `df > 0` resulta em Trues para valores > 0
# - `.sum(axis=1)` soma os Trues de cada linha
df["num_ratings"] = (df > 0).sum(axis=1)

# Reordenando a tabela.
df = df.sort_values("num_ratings", ascending=False)

# Excluindo a coluna auxiliar das contagens.
df = df.drop("num_ratings", axis=1)

2

One solution I found was: take the unique values of the columns 'users_ID_User' and 'Material', generate a new data by means of a dictionary and insert this data with an append in another dataframe. (In my example I used your dataframe up to index 9)

Take the unique values:

lista_valores_unicos_ID = list(set(df['users_ID_User'].values))
>>> print(lista_valores_unicos_ID)
['B', 'A', 'C', 'E', 'D']

lista_valores_unicos_material = list(set(df['Material'].values))
>>> print(lista_valores_unicos_material)
['"Julius Caesar" Speech Analysis StAIR', '"Virgin Birth" method promises ethical stem cells', 'Hey Pogo!...from citation to fulltext']

Now create an empty dataframe with the desired output template:

data_resposta = {'users_ID_User':[], 'Rating':[]}
for elemento in lista_valores_unicos_material:
    data_resposta.update({elemento:[]})

df_resultado = pd.DataFrame(data=data_resposta)

To make it easier, I created a dataframe with the calculated 'Ratings', so I can only take these values:

df_groupby = df.groupby('users_ID_User').agg({'Rating':'sum',})
>>> print(df_groupby)
               Rating
users_ID_User        
A                  12
B                   8
C                   5
D                   8
E                   9

#Pego eles assim
>>> print(df_groupby.loc['A'].values[0])
12

I can pick up the occurrence of each phrase in the 'Material' column of a specific user like this:

df_unico_A = df.loc[df['users_ID_User'] == 'A']['Material'].value_counts().to_frame()
>>> print(df_unico_A)
                                                   Material
"Julius Caesar" Speech Analysis StAIR                     1
"Virgin Birth" method promises ethical stem cells         1
Hey Pogo!...from citation to fulltext                     1

#Iterando por linha
>>> for index, row in df_unico_A.iterrows():
        print(index, ':', row['Material'])

"Julius Caesar" Speech Analysis StAIR : 1
"Virgin Birth" method promises ethical stem cells : 1
Hey Pogo!...from citation to fulltext : 1

Now itero for each user of the list lista_valores_unicos_ID taking the occurrences with the above logic, placing in a dictionary, and insert the data in the df_resultado with an append:

for id in lista_valores_unicos_ID:

    nova_data = {'users_ID_User':id, 'Rating':df_groupby.loc[id].values[0]}
    for elemento in lista_valores_unicos_material:
        #Cada frase é inicializada com uma contagem 0
        nova_data.update({elemento:0})

    df_unicos = df.loc[df['users_ID_User'] == id]['Material'].value_counts().to_frame()

    for index, row in df_unicos.iterrows():
        nova_data[index] = row['Material']

    #for i in (0,len(df_unicos)):
    #    nova_data[df_unicos.index[i]] = df_unicos.values[i]

    df_resultado = df_resultado.append(nova_data, ignore_index=True)

Now the df_resultado is the expected output, but still to order:

df_resultado = df_resultado.sort_values('Rating', ascending=False )

Now its output should be something like: (I cut the sentences manually and put '...', but I did not change the values obtained)

>>> print(df_resultado )
   "Julius...   "Virgin...   Hey Pogo!...   Rating   users_ID_User  
1         1.0          1.0            1.0     12.0               A  
3         1.0          1.0            0.0      9.0               E  
0         1.0          0.0            1.0      8.0               B  
4         1.0          0.0            1.0      8.0               D  
2         0.0          0.0            1.0      5.0               C  

Browser other questions tagged

You are not signed in. Login or sign up in order to post.