Python: consolidate duplicate lines by adding values

Question

Python: consolidate duplicate lines by adding values

Asked 4 years ago

Viewed 36 times

-1

I have a spreadsheet in excel format (.xlsx) with the following columns: "matricula", "name", "value", as shown below.

I would like to delete the repeated data by adding up the values.

The final result should be another spreadsheet in excel with only 6 (six) lines, equal to the second image below.

Below would be the spreadsheet with the desired result.

import pandas as pd


planilha = pd.read_excel(r"C:\Users\wjrs1\Downloads\nova.xlsx", engine='openpyxl')

arquivo = pd.ExcelWriter(r"C:\Users\wjrs1\Downloads\teste.xlsx")

arquivo.to_excel(planilha, 'sheet1',index=False) #tem algo dando errado e não sei o que é

planilha.save()

Could you help me? Thank you in advance.

groupby(['Registration', 'Name', 'Value'], as_index=False)['Value']. sum()

– Alexandre Simões

2021/07/17 at 01:21
Use DataFrame.drop_duplicates(). Example: planilha.drop_duplicates()

– Augusto Vasques

2021/07/17 at 05:36

1 answer

Browser other questions tagged python pandas

You are not signed in. Login or sign up in order to post.

by Paulo Marques • **3,739** points · Answer 1 · 2021-07-17T07:38:05+00:00

Use the groupby and then the to_excel as below

import pandas as pd

planilha = pd.read_excel(r"C:\Users\wjrs1\Downloads\nova.xlsx", engine='openpyxl')

novo_df = planilha.groupby(['Matrícula', 'Nome'])['Valor'].sum().reset_index()

novo_df.to_excel("C:\Users\wjrs1\Downloads\teste.xlsx", sheet_name="Sheet 1", index=False)