Remove duplicate dates by summing the values

Asked

Viewed 725 times

2

I need to remove duplicate dates from the dataframe and add the values corresponding to those dates.

I found an answer in the NA stack that approaches what I need, but I couldn’t shape it for my need:

df.groupby('data', group_keys=False).apply(lambda x: x.loc[x.valor.idxmax()])

Only instead of grouping by date and keeping the higher value, I need to keep the sum of the values, not just the higher value.

1 answer

4


I managed to solve the problem, so I will answer to help anyone who has to face the same problem in the future.

Follow the explanation with the code:

Generating the dataframe from an existing dictionary:

swap_df = pd.DataFrame(swap_montado, columns=['Portfolio', 'Data posicao', 'Valor'])

Grouping the data from the date and summing the values of the series Valor which correspond to the duplicated dates :

swap_df = swap_df.groupby('Data posicao').agg({
            'Portfolio': 'first',
            'Valor': sum
        })

Reorganizing the order of the dataframe columns:

swap_df = swap_df[['Valor', 'Portfolio']]

Resolution found in: https://stackoverflow.com/questions/35403752/pandas-sum-over-duplicated-indices-with-sum

Browser other questions tagged

You are not signed in. Login or sign up in order to post.