How to unify by Python several repeated lines of an Excel file?

Question

How to unify by Python several repeated lines of an Excel file?

Asked 3 years, 11 months ago

Viewed 49 times

1

I have this spreadsheet in Excel, which is generated in the system by the pandas dataframe

I can even generate the file, the problem is that it generates this way above. I need that, for example, in the column nome and sobrenome, it unifies the same names on a single line and therefore the data will be on the same line.

The end result has to be like this:

My code generates the spreadsheet but is not unifying the lines. How can I do this?

That’s the part of the code I can’t go through:

import pandas as pd

resultado = []
mydict = {}
for row in dados:
       if mydict  != {}:
            resultado.append(mydict)
       mydict = {}
       mydict['nome'] = row['nome']
       mydict['sobrenome'] = row['sobrenome']

If I take that if of the loop for and I put only one specific name, I get it to print all on one line, but if it goes back to the for, it prints all the nomes and sobrenomes, only with the information on separate lines.

1 answer

Browser other questions tagged python pandas

You are not signed in. Login or sign up in order to post.

by Felipe Gambini • **139** points · Answer 1 · 2021-08-25T23:59:23+00:00

First you create a dataframe only with the repeating data, that is to say, nome and sobrenome (which will be used as chave primária), and remove duplicates like this:

df = dados[['nome','sobrenome']].drop_duplicates()
display(df)

Upshot:

name	surname
John	Peter
Paul	Joseph

This table will serve as the main table to join the other information.

Now for each age column, you need to create a table with the chave primária of the main table and the age values of each name, removing the lines containing null values with the method .dropna(axis=0), thus:

for i in range(2,5):  # 2,5 são os indíces das colunas com as idades
  temp = dados.iloc[:,[0,i]].dropna(axis=0)  # Recebe a coluna nome e a coluna de idade
  df = pd.merge(df, temp)

On the line

df = pd.merge(df, temp)

we are merging our main table with the temporary table that contains the values of the age column.

The final result returns its table without the null and duplicate values:

name	surname	Idade_agosto_2019	Idade_agosto_2020	Idade_agosto_2021
John	Peter	30.0	31.0	32.0
Paul	Joseph	21.0	22.0	23.0