Add data within a dataframe based on a condition for two columns

Question

Asked 4 years, 2 months ago

Viewed 42 times

-1

I have a DF with four columns. I need to add the values of column 3 when the values of column 1 and 2 are duplicated and discard duplicates. Ex:

df = pd.DataFrame({"A": [1,1,1,1,2,2],
                   "B": [4,4,5,5,6,6],
                    "C" : [1,2,3,4,5,6]})

You would need to add up the values of column C when the pair of values of A and B is duplicated. Getting:

Df_resultante:
       A  B  C
       1  4  3
       1  5  7
       2  6  11

1 answer

Browser other questions tagged python pandas

You are not signed in. Login or sign up in order to post.

by Lucas • **3,858** points · Answer 1 · 2021-05-26T02:02:33+00:00

0

If you always have this structure, you can just use groupby:

df.groupby(['A','B']).agg({'C':'sum'}).reset_index()

What if there’s a fourth column that I need to keep the maximum values in the resulting df? It works if inside the Aggregate I put: Agg({'C':'sum',’D':'max'})?

– user118799

2021/05/26 at 14:49
yes. It works yes. You can pass a numpy function tbm as np.max

– Lucas

2021/05/26 at 15:00
In the past structure, it would suffice df.groupby(['A','B']).sum().reset_index() which is faster than the agg

– Paulo Marques

2021/05/26 at 16:28