groupby - add according to criteria

Asked

Viewed 290 times

0

I intend to sum up all the earnings of an employee, by his Cpf, but only when he wins 100, disregarding other amounts. I’m using this code, but it’s not right:

aip.groupby(['CPF']).sum() where(aip['Valor']==100)

Apart from the column sum, which sums everything, and not only in the occurrence of 100,00, the spreadsheet model q to working is equal to this:

https://i.stack.Imgur.com/psofy.jpg

But I do not want to add everything, but only in case the commission is maximum, ie 100.

1 answer

1


Question

Hello Roger, to solve it is simple, we just have to think about the logic of chaining pandas to perform correctly. If you want to add (total) the Cpfs (aggregates) with values higher than 100 then: First you filter the values, then group the Cpfs and finally, sum.

Example

We import the libraries and create the dataframe

We generate 300 values randomly from 1 to 500 for data and from 1 to 4 for CPF.

# importa as bibliotecas
import pandas as pd
import numpy as np

# cria o dataframe do exemplo
data = pd.Series(np.random.randint(1, 501, size=300), name='Dados')
cpf = pd.Series(np.random.randint(1, 5, size=300), name='CPF')
df = pd.concat([data, cpf], axis=1)

Output de df

    Dados   CPF
0   424     4
1   416     1
2   231     1
3   423     1
4   36      1
5   14      4
6   317     1
7   4       4
8   34      3
9   98      1
10  464     4
...

Problem solving

As already mentioned, to solve enough: 1. filter the data; 2. group them by the desired column; 3. add. Other operations can be performed as: count (Count) or average (Mean).

df[df['Dados'] > 100].groupby(['CPF']).sum()

Exit

CPF Dados
1   19023
2   17130
3   16998
4   16309

To solve any other future problem using pandas, think about how to create a pipeline (chaining) of logical operations more suitable for this. It requires training, but you get the hang of it. Hug and good studies.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.