How to single lines a Dataframe Pandas that vary in just a few fields?

Question

How to single lines a Dataframe Pandas that vary in just a few fields?

Asked 5 years, 10 months ago

Viewed 36 times

1

I have a Dataframe with the following format:

And I would like to have the following result:

How do I get it?

Thank you.

1 answer

Browser other questions tagged python pandas

You are not signed in. Login or sign up in order to post.

by jfaccioni • **1,283** points · Answer 1 · 2019-09-10T12:25:19+00:00

Try it with that code:

df = pd.read_excel('companies.xlsx')  # seu arquivo de input

cnpjs = df['cnpj'].unique()
aggregator = []

for cnpj in cnpjs: 
    company_df = df.loc[df['cnpj'] == cnpj] 
    for index, socio in enumerate(company_df['socio'], 1): 
        company_df.loc[:, 'socio_' + str(index)] = socio 
        company_df = company_df.drop(columns='socio') 
   aggregator.append(company_df)

result = pd.concat(aggregator, sort=False)
result = result.drop_duplicates(subset=['cnpj', 'empresa', 'endereco', 'telefone'])
result.reset_index(inplace=True, drop=True)
print(result)

Input:

Output:

Note that the line company_df.loc[:, 'socio_' + str(index)] = socio results in a SettingWithCopyWarning, but I believe this can be ignored without major consequences, since we are using the operator df.loc to access and modify subsices of Dataframes, as recommended by Pandas.