Getting maximum value of each grouping with groupby pandas

Asked

Viewed 343 times

2

Hello,

i have the DF below which I would like to group by 'country' and get the maximum population value:

df = pd.DataFrame({'pais': ['Brasil', 'Brasil' , 'EUA', 'EUA'],
                  'cidade': ['Santos', 'São Paulo', 'Orlando', 'Nova York'],
                  'populacao': [100000, 500000, 200000, 550000],
                  'idade':[430,440,200,150]})
df

The result I want:

country populated city
Brazil São Paulo 500000
USA New York 550000

What I’ve already done:

df.groupby(['pais','cidade']).loc[df.populacao == df.populacao.max()]

It returns me: "Attributeerror: Cannot access callable attribute 'Loc' of 'Dataframegroupby' Objects, Try using the 'apply' method"

I understand that I have to use a function and apply it, but I don’t know exactly how. Can anyone help me?

2 answers

2

Using another logic to work easy Jessica.

Try it here:

df = pd.DataFrame({'pais': ['Brasil', 'Brasil' , 'EUA', 'EUA'],
                  'cidade': ['Santos', 'São Paulo', 'Orlando', 'Nova York'],
                  'populacao': [100000, 500000, 200000, 550000],
                  'idade':[430,440,200,150]})
df.sort_values('populacao', ascending=False).drop_duplicates(['pais'])

inserir a descrição da imagem aqui

Basically the idea is to order the population and after that erase the country duplicates only keeping the first display (Which in case will be the city with the largest population)

2


You can do this using the groupby with idxmax. The idea is to select the indices where the largest population of each country is.

df.iloc[df.groupby('pais')['populacao'].idxmax()]
#saida
    pais     cidade     populacao   idade
1   Brasil   São Paulo  500000      440
3   EUA Nova York       550000      150

Browser other questions tagged

You are not signed in. Login or sign up in order to post.