Add data within a dataframe based on a condition for two columns

Asked

Viewed 42 times

-1

I have a DF with four columns. I need to add the values of column 3 when the values of column 1 and 2 are duplicated and discard duplicates. Ex:

df = pd.DataFrame({"A": [1,1,1,1,2,2],
                   "B": [4,4,5,5,6,6],
                    "C" : [1,2,3,4,5,6]})

You would need to add up the values of column C when the pair of values of A and B is duplicated. Getting:

Df_resultante:
       A  B  C
       1  4  3
       1  5  7
       2  6  11

1 answer

0

If you always have this structure, you can just use groupby:

df.groupby(['A','B']).agg({'C':'sum'}).reset_index()
  • What if there’s a fourth column that I need to keep the maximum values in the resulting df? It works if inside the Aggregate I put: Agg({'C':'sum',’D':'max'})?

  • yes. It works yes. You can pass a numpy function tbm as np.max

  • In the past structure, it would suffice df.groupby(['A','B']).sum().reset_index() which is faster than the agg

Browser other questions tagged

You are not signed in. Login or sign up in order to post.