0
I have a data frame that needs to be removed duplicates and later from the previous dataframe I need to add a specific column. Actually I have 5 DF, 4 have already worked, but one because it contains duplicate values after the groupby it gives index error. Can anyone help me? I’ve tried so many ways.
DF1['TOTAL KBYTES VAN'] = DFORIGINAL.groupby(['CONVENIO', 'CNPJ', 'PRODUTO', 'RATEIO VAN'])['TOTAL KBYTES VAN'].transform(np.sum)
this code works for 4 DF, but in one of them TOTAL KBYTES "duplicates" several times, and this is a reality, I need to add also duplicates.
I have tried it in many ways but without success.
examples of attempts:
DF1['TOTAL KBYTES VAN'] = DFORIGINAL.groupby(['CONVENIO', 'CNPJ', 'PRODUTO', 'RATEIO VAN'])['TOTAL KBYTES VAN'].sum()
DF1['TOTAL KBYTES VAN'] = DFORIGINAL.groupby(['CONVENIO', 'CNPJ', 'PRODUTO', 'RATEIO VAN'])['TOTAL KBYTES VAN'].cumsum()
DF1['TOTAL KBYTES VAN'] = DFORIGINAL.groupby(['CONVENIO', 'CNPJ', 'PRODUTO', 'RATEIO VAN'], as_index=False)['TOTAL KBYTES VAN'].sum()
DF1['TOTAL KBYTES VAN'] = DFORIGINAL.groupby(['CONVENIO', 'CNPJ', 'PRODUTO', 'RATEIO VAN'], as_index=False)['TOTAL KBYTES VAN'].transform(np.sum)
DFORIGINAL
VAN;CNPJ;CLIENTE;PRODUTO;RATEIO VAN;TOTAL KBYTES VAN;CONVENIO;% MARGEM
EMPRESA;0123456789777;EMPRESA;OUTROS;100;2,63671875;220000000;1
EMPRESA;0123456789777;EMPRESA;PAGAMENTO;100;2,63671875;220000000;1
EMPRESA;0123456789777;EMPRESA;OUTROS;100;2,63671875;220000000;1
EMPRESA;0123456789777;EMPRESA;OUTROS;100;2,63671875;220000000;1
EMPRESA;0123456789777;EMPRESA;PAGAMENTO;100;2,63671875;220000000;1
EMPRESA;0123456789777;EMPRESA;OUTROS;100;2,63671875;220000000;1
VAN | CNPJ | CLIENT | PRODUCT | APPORTIONMENT VAN | TOTAL KBYTES VAN | CONVENTION | % MARGIN |
---|---|---|---|---|---|---|---|
COMPANY | 0123456789777 | COMPANY | OTHERS | 100 | 2,63671875 | 220000000 | 1 |
COMPANY | 0123456789777 | COMPANY | PAYMENT | 100 | 2,63671875 | 220000000 | 1 |
COMPANY | 0123456789777 | COMPANY | OTHERS | 100 | 2,63671875 | 220000000 | 1 |
COMPANY | 0123456789777 | COMPANY | OTHERS | 100 | 2,63671875 | 220000000 | 1 |
COMPANY | 0123456789777 | COMPANY | PAYMENT | 100 | 2,63671875 | 220000000 | 1 |
COMPANY | 0123456789777 | COMPANY | OTHERS | 100 | 2,63671875 | 220000000 | 1 |
I only got this way, see the expected result. Obs. Before making the sum, I make a "drop_duplicates" in DF1, using the same criteria as groupby, so DF1 brings only the 2 unique lines, and from them I hope to deliver the sum of the TOTAL KBYTES VAN
EXPECTED RESULT DF1
VAN;CNPJ;CLIENTE;PRODUTO;RATEIO VAN;TOTAL KBYTES VAN;CONVENIO;% MARGEM
EMPRESA;0123456789777;EMPRESA;OUTROS;100;10,546875;220000000;1
EMPRESA;0123456789777;EMPRESA;PAGAMENTO;100;5,2734375;220000000;1
VAN | CNPJ | CLIENT | PRODUCT | APPORTIONMENT VAN | TOTAL KBYTES VAN | CONVENTION | % MARGIN |
---|---|---|---|---|---|---|---|
COMPANY | 0123456789777 | COMPANY | OTHERS | 100 | 10,546875 | 220000000 | 1 |
COMPANY | 0123456789777 | COMPANY | PAYMENT | 100 | 5,2734375 | 220000000 | 1 |
Show, that’s exactly it. But how did you manage to do it this way here?
– Eduardo Garcia de Oliveira
So https://answall.com/editing-help#Tables
– Augusto Vasques