Pandas Soma Condicional

Question

Pandas Soma Condicional

Asked 8 years, 8 months ago

Viewed 1,966 times

1

Hello. I have the following situation

 df1 = pd.DataFrame({'Key':['a','b','c','a','c','a','b','c'],'Value':[9.2,8.6,7.2,8.3,8.5,2.1,7.4,1.1]})
 df2 = pd.DataFrame({'Key':['a','b','c']})

and would like the following reply

in [0]: df2
out[0]: 
  Key  soma
0   a  19.6
1   b  16.0
2   c  16.8

The only way I know is this:

for ind,row in df2.iterrows():
        df2.soma[ind] = df1.loc[df1.Key == row.Key, 'Value'].sum()

But it takes a lot of time that makes my execution impossible, because it is a very large amount of data.

love to all

2 answers

0

According to the response of Soen, another possible way to get the sum column is you eliminate the looping and use a groupby (aggregation) to create the new column:

df2['soma'] = df1.groupby('Key')["Value"].transform(np.sum)

After execution:

In [35]: df2
Out[35]:
  Key  soma
0   a  19.6
1   b  16.0
2   c  16.8

If you are not using the library numpy (recommended), replace the np.sum for sum.

Okay, but I don’t know why it’s not working for me. It’s the same structure, but with more data (3million) and 70 classes. Maybe don’t fix that volume

– Mueladavc

2016/12/01 at 10:55
@Mueladavc At execution, does an error message appear? Or does it take the same time as in the question?

– Gomiero

2016/12/01 at 12:54
No message appears, the way I was able to use this command and then delete the duplicate values of the Keys, was a substantial improvement. But this way is there, with this example dataframe works very well, but with my practice does not. Thank you Gomiero.

– Mueladavc

2016/12/02 at 10:55
@Mueladavc You’re welcome!

– Gomiero

2016/12/02 at 12:42

Browser other questions tagged python-3.x pandas

You are not signed in. Login or sign up in order to post.

by britodfbr • **688** points · Answer 1 · 2020-04-01T13:10:45+00:00

df1.groupby(by=['Key']).sum()

    Value
Key 
a   19.6
b   16.0
c   16.8