Python: consolidate duplicate lines by adding values

Asked

Viewed 36 times

-1

I have a spreadsheet in excel format (.xlsx) with the following columns: "matricula", "name", "value", as shown below.

I would like to delete the repeated data by adding up the values.

The final result should be another spreadsheet in excel with only 6 (six) lines, equal to the second image below.

Planilha inicial

Below would be the spreadsheet with the desired result.

Resultado desejado

import pandas as pd


planilha = pd.read_excel(r"C:\Users\wjrs1\Downloads\nova.xlsx", engine='openpyxl')

arquivo = pd.ExcelWriter(r"C:\Users\wjrs1\Downloads\teste.xlsx")

arquivo.to_excel(planilha, 'sheet1',index=False) #tem algo dando errado e não sei o que é

planilha.save()

Could you help me? Thank you in advance.

1 answer

2


Use the groupby and then the to_excel as below

import pandas as pd

planilha = pd.read_excel(r"C:\Users\wjrs1\Downloads\nova.xlsx", engine='openpyxl')

novo_df = planilha.groupby(['Matrícula', 'Nome'])['Valor'].sum().reset_index()

novo_df.to_excel("C:\Users\wjrs1\Downloads\teste.xlsx", sheet_name="Sheet 1", index=False)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.