1
Try it with that code:
df = pd.read_excel('companies.xlsx') # seu arquivo de input
cnpjs = df['cnpj'].unique()
aggregator = []
for cnpj in cnpjs:
company_df = df.loc[df['cnpj'] == cnpj]
for index, socio in enumerate(company_df['socio'], 1):
company_df.loc[:, 'socio_' + str(index)] = socio
company_df = company_df.drop(columns='socio')
aggregator.append(company_df)
result = pd.concat(aggregator, sort=False)
result = result.drop_duplicates(subset=['cnpj', 'empresa', 'endereco', 'telefone'])
result.reset_index(inplace=True, drop=True)
print(result)
Input:
Output:
Note that the line company_df.loc[:, 'socio_' + str(index)] = socio
results in a SettingWithCopyWarning
, but I believe this can be ignored without major consequences, since we are using the operator df.loc
to access and modify subsices of Dataframes, as recommended by Pandas.