-1
I have a DF with four columns. I need to add the values of column 3 when the values of column 1 and 2 are duplicated and discard duplicates. Ex:
df = pd.DataFrame({"A": [1,1,1,1,2,2],
"B": [4,4,5,5,6,6],
"C" : [1,2,3,4,5,6]})
You would need to add up the values of column C when the pair of values of A and B is duplicated. Getting:
Df_resultante:
A B C
1 4 3
1 5 7
2 6 11
What if there’s a fourth column that I need to keep the maximum values in the resulting df? It works if inside the Aggregate I put: Agg({'C':'sum',’D':'max'})?
– user118799
yes. It works yes. You can pass a numpy function tbm as
np.max
– Lucas
In the past structure, it would suffice
df.groupby(['A','B']).sum().reset_index()
which is faster than theagg
– Paulo Marques