Remove duplicate dates by summing the values

Question

Asked 8 years, 1 month ago

Viewed 725 times

2

I need to remove duplicate dates from the dataframe and add the values corresponding to those dates.

I found an answer in the NA stack that approaches what I need, but I couldn’t shape it for my need:

df.groupby('data', group_keys=False).apply(lambda x: x.loc[x.valor.idxmax()])

Only instead of grouping by date and keeping the higher value, I need to keep the sum of the values, not just the higher value.

1 answer

Browser other questions tagged python python-3.x pandas

You are not signed in. Login or sign up in order to post.

by Matheus • **501** points · Answer 1 · 2017-06-21T20:58:32+00:00

I managed to solve the problem, so I will answer to help anyone who has to face the same problem in the future.

Follow the explanation with the code:

Generating the dataframe from an existing dictionary:

swap_df = pd.DataFrame(swap_montado, columns=['Portfolio', 'Data posicao', 'Valor'])

Grouping the data from the date and summing the values of the series Valor which correspond to the duplicated dates :

swap_df = swap_df.groupby('Data posicao').agg({
            'Portfolio': 'first',
            'Valor': sum
        })

Reorganizing the order of the dataframe columns:

swap_df = swap_df[['Valor', 'Portfolio']]