Pandas Soma Condicional

Asked

Viewed 1,966 times

1

Hello. I have the following situation

 df1 = pd.DataFrame({'Key':['a','b','c','a','c','a','b','c'],'Value':[9.2,8.6,7.2,8.3,8.5,2.1,7.4,1.1]})
 df2 = pd.DataFrame({'Key':['a','b','c']})

and would like the following reply

in [0]: df2
out[0]: 
  Key  soma
0   a  19.6
1   b  16.0
2   c  16.8

The only way I know is this:

for ind,row in df2.iterrows():
        df2.soma[ind] = df1.loc[df1.Key == row.Key, 'Value'].sum()

But it takes a lot of time that makes my execution impossible, because it is a very large amount of data.

love to all

2 answers

0


According to the response of Soen, another possible way to get the sum column is you eliminate the looping and use a groupby (aggregation) to create the new column:

df2['soma'] = df1.groupby('Key')["Value"].transform(np.sum)

After execution:

In [35]: df2
Out[35]:
  Key  soma
0   a  19.6
1   b  16.0
2   c  16.8

If you are not using the library numpy (recommended), replace the np.sum for sum.

  • Okay, but I don’t know why it’s not working for me. It’s the same structure, but with more data (3million) and 70 classes. Maybe don’t fix that volume

  • @Mueladavc At execution, does an error message appear? Or does it take the same time as in the question?

  • No message appears, the way I was able to use this command and then delete the duplicate values of the Keys, was a substantial improvement. But this way is there, with this example dataframe works very well, but with my practice does not. Thank you Gomiero.

  • @Mueladavc You’re welcome!

0

df1.groupby(by=['Key']).sum()

    Value
Key 
a   19.6
b   16.0
c   16.8

Browser other questions tagged

You are not signed in. Login or sign up in order to post.