How to bring more fields in the Pandas groupby, without necessarily having to use them in the grouping?

Question

How to bring more fields in the Pandas groupby, without necessarily having to use them in the grouping?

Asked 4 years, 8 months ago

Viewed 285 times

0

Good afternoon, guys. I’m messing with the Pandas library in Python and I’d like to ask you a question, if possible. I have a list of values and need to reduce this list to just one record per date. The criterion is that the third field (counter) is the largest of its date. To do this, I used the following code:

import pandas as pd

lista = [("2020-11-16", "803", 4),
    ("2020-11-16", "801", 18),
    ("2020-11-16", "802", 20),
    ("2020-11-17", "801", 12),
    ("2020-11-17", "802", 6)]

df = pd.DataFrame(lista)
df.columns=['data', 'codigo', 'contador']
df2 = df.groupby('data')['contador'].max()
for i in df2.items():
    data = i[0]
    contador = i[1]
    print(data, contador)

#O resultado obtido foi o seguinte e está correto: 
['2020-11-16', 20]
['2020-11-17', 12]

My doubt would be if it is possible to bring together in this result the column 'code', without it passing by grouping. If I put it in the groupby I will have multiple records per date and it will be wrong, because I can only have one record per date. Would there be any way to do this, whether for pandas or otherwise? Thanks from now on.

2 answers

Browser other questions tagged python group-by

You are not signed in. Login or sign up in order to post.

by lmonferrari • **3,550** points · Answer 1 · 2020-11-17T22:12:55+00:00

2

One solution would be to create a 'filter':

filtro = df.groupby('data')['contador'].max()

And then use the isin of the pandas:

df[df['contador'].isin(filtro)].reset_index(drop = True)

Exit:

    data        codigo  contador
0   2020-11-16    802        20
1   2020-11-17    801        12

1

@imonferrari, beautiful solution! I’m not in the habit of using the method isin(), because I usually work with the whole base. It will surely make me think differently from now on.

– Paulo Marques

2020/11/18 at 00:13
Oops! Thank you, Paul! isin is a very fast method, once you start using becomes an addiction kkkk. Pranks part... Great Hug!

– lmonferrari

2020/11/18 at 00:20
1

Perfect. I didn’t know this possibility of filter with isin(). Thank you very much!

– jhenrique

2020/11/18 at 12:33

by Paulo Marques • **3,739** points · Answer 2 · 2020-11-17T22:53:28+00:00

I believe what you’re looking for is the aggregate

To get the result, just:

Create a desired function structure for each field

f = {'contador': 'max', 'codigo': 'first'}

Do the groupby with agg passing by f as a parameter

df2 = series = df.groupby('data', as_index=False)[['contador', 'codigo']].agg(f)

The result will be:

>>> df2
         data  contador codigo
0  2020-11-16        20    803
1  2020-11-17        12    801

Note that the code chosen by Aggregate was the first, which in this case is the 803. However, in case you want the same that is on the line of contador=20, that is to say, codigo=802 do exactly what you were doing and then conclude with a merge()

df3 = pd.merge(df2, df, on=['data', 'contador'])

The result will be:

>>> df3
         data  contador codigo
0  2020-11-16        20    802
1  2020-11-17        12    801

I hope I’ve helped