How to single lines a Dataframe Pandas that vary in just a few fields?

Asked

Viewed 36 times

1

I have a Dataframe with the following format:

formato do dataframe original

And I would like to have the following result:

formato do dataframe que preciso obter

How do I get it?

Thank you.

1 answer

0

Try it with that code:

df = pd.read_excel('companies.xlsx')  # seu arquivo de input

cnpjs = df['cnpj'].unique()
aggregator = []

for cnpj in cnpjs: 
    company_df = df.loc[df['cnpj'] == cnpj] 
    for index, socio in enumerate(company_df['socio'], 1): 
        company_df.loc[:, 'socio_' + str(index)] = socio 
        company_df = company_df.drop(columns='socio') 
   aggregator.append(company_df)

result = pd.concat(aggregator, sort=False)
result = result.drop_duplicates(subset=['cnpj', 'empresa', 'endereco', 'telefone'])
result.reset_index(inplace=True, drop=True)
print(result)

Input:

inserir a descrição da imagem aqui

Output:

inserir a descrição da imagem aqui

Note that the line company_df.loc[:, 'socio_' + str(index)] = socio results in a SettingWithCopyWarning, but I believe this can be ignored without major consequences, since we are using the operator df.loc to access and modify subsices of Dataframes, as recommended by Pandas.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.