Grouping table groupby pandas

Asked

Viewed 1,217 times

3

I have a table in this format

User   Material   Nota
1       youtube    5
2       facebook   4
3       facebook   3
4       facebook   5
6       youtube    2
7       orkut      3
8       twitter    4

and would like to group so that in the final table shows only websites evaluated by more than 1 different user

Material   User    Nota
youtube     1      5
            6      2
facebook    2      4 
            3      3
            4      5

my code and my logic

import pandas as pd

df = pd.read_csv("tabela1.csv",delimiter=",")


tabela2= pd.DataFrame({'count' : df.groupby(["Material","User","Nota"]).size()})


del tabela2['count']

tabela2.to_csv('tabela_2.csv')
tabela2 = pd.read_csv("tabela_2.csv",delimiter=",")

but so it lists the ones that were evaluated by 1 user, I wonder if there is something in the group that lists only the different ones to group?

What I thought do the two for’s sit 1 fixed in the Material column and in case counting how many times each material appears after deleting the materials appearing less q twice, to treat materials with less than 1 evaluation, however I believe that for a very large base it will become very expensive in time

1 answer

1


You can use the value_counts to find out which of the column values Material have more than one input, and then filter your Dataframe by these values:

import pandas as pd
import io

dados = io.StringIO("""
User,Material,Nota
1,youtube,5
2,facebook,4
3,facebook,3
4,facebook,5
6,youtube,2
7,orkut,3
8,twitter,4""")

df = pd.read_csv(dados, index_col='User')

# Criamos uma série com a contagem de 
# ocorrências pra cada valor da coluna Material
counts = df['Material'].value_counts()
print(counts)  # facebook 3, youtube 2, orkut 1...

# Filtramos para que a série só inclua os 
# materiais com mais de uma contagem
counts = counts[counts > 1]
print(counts.index)  # Index(['facebook', 'youtube'], dtype='object')

# Filtramos o dataframe original para que 
# contenha somente entradas que estejam na série
df = df[df['Material'].isin(counts.index)]
print(df)

# Se quisermos, podemos ordená-los também, como
# no exemplo dado na questão
df = df.sort_values('Material')
print(df)

Final result:

User  Material     Nota
2     facebook     4
3     facebook     3
4     facebook     5
1      youtube     5
6      youtube     2

Browser other questions tagged

You are not signed in. Login or sign up in order to post.