Merge lists based on key column value

Asked

Viewed 44 times

1

I am converting 2 dataframes into 2 lists. And I will group them by the CNPJ_UF_MES column (always the 1st column of each dataframe. In the real example, I will join more than 2 dataframes.

After grouping these lists, I will have lists of lists. and need to unite them based on the key column.

import pandas as pd

df1 = pd.DataFrame({
    'CNPJ_UF_MES': ['1-MG', '1-MG', '2-MG'],
    'CLIENTE_PAR_COMEX': ['1_2_3', '2_3_4', '3_4_5'],
    'REG': ['1110', '1110', '1110'],
    'COD_IP_PAR': ['200', '200', '200'],
    'COD_CLIENTE': ['xxr', 'xxv', 'xxw'],
    'IND_COMEX': ['wer', 'cad', 'sder'],
    'IND_EXTEMP': ['key_1', 'key_2', 'key_3'],
    'DT_INI': ['01032021', '01032021', '01032021'],
    'DT_FIN': ['31032021', '31032021', '31032021'],
    'VALOR': ['wer', 'cad', 'sder'],
    'PIPE_FIN': ['', '', '']
})


df2 = pd.DataFrame({
    'CNPJ_UF_MES': ['1-MG', '1-MG', '1-MG', '1-MG', '2-MG', '2-MG'],
    'CLIENTE_PAR_COMEX': ['1_2_3', '2_3_4', '3_4_5','1_2_3', '2_3_4', '3_4_5'],
    'REG': ['1110', '1110', '1110', '1110', '1110', '1110'],
    'COD_MCAPT': ['C-200', 'C-200', 'C-200', 'C-200', 'C-200', 'C-200'],
    'DT_OP': ['07032021', '07032021', '07032021', '08032021', '08032021', '08032021'],
    'VALOR': ['1', '2', '3', '1', '2', '3'],
    'QTD': ['2', '2', '2', '2', '2', '2'],
    'CNPJ_ADQUI': ['123', '123', '123', '123', '123', '123'],
    'PIPE_FIN': ['', '', '', '', '', '']

})



list1 = df1.values.tolist()
#print(list1)

values = set(map(lambda x:x[0], list1))
newlist1 = [[y for y in list1 if y[0]==x] for x in values]

print(newlist1)

list2 = df2.values.tolist()
#print(list2)


values = set(map(lambda x:x[0], list2))
newlist2 = [[y for y in list2 if y[0]==x] for x in values]

print(newlist2)

##Estou tendo problema aqui:
list3 = newlist1.append(newlist2)
print(list3)

I would like to have lists of lists grouped by the key column.

Any idea?

This is the desired output:

[[
  ['2-MG', '3_4_5', '1110', '200', 'xxw', 'sder', 'key_3', '01032021', '31032021', 'sder', ''],
  ['2-MG', '2_3_4', '1110', 'C-200', '08032021', '2', '2', '123', ''],
  ['2-MG', '2_3_4', '1110', 'C-200', '08032021', '2', '2', '123', '']
  ],
 [
  ['1-MG', '1_2_3', '1110', '200', 'xxr', 'wer', 'key_1', '01032021', '31032021', 'wer', ''],
  ['1-MG', '2_3_4', '1110', '200', 'xxv', 'cad', 'key_2', '01032021', '31032021', 'cad', ''],
  ['1-MG', '1_2_3', '1110', 'C-200', '07032021', '1', '2', '123', ''],
  ['1-MG', '2_3_4', '1110', 'C-200', '07032021', '2', '2', '123', ''],
  ['1-MG', '3_4_5', '1110', 'C-200', '07032021', '3', '2', '123', ''],
  ['1-MG', '1_2_3', '1110', 'C-200', '08032021', '1', '2', '123', '']
  ]]

Later I will manipulate these lists and create CSV files, each file grouped by the key column "CNPF_UF_MES".

  • 1

    If they are the same fields, why don’t you concatenate the dataframes and then generate the lists?

  • Thanks for the feedback. But the dfs columns are very different, and I need to convert to lists first.

1 answer

2


You can group a Dataframe by the data in a column with the method DataFrame.groupby() which divides the objects into groups defined by a criterion, in which case the data in the column will be specified CNPJ_UF_MES defining the groups in each Dataframe.

From these groups it is possible to build a dictionary result whose keys will be the column data CNPJ_UF_MES and values will be a list of lists obtained with the concatenation of values of each group:

result = {}                                         #Inicializa o dicionário que irá manter o resultado.
#Itera em df por ambos os Dataframes...
for df in [df1, df2]:
  #...agrupa df onde n é o dado em CNPJ_UF_MES e g é o respectivo grupo...
  for n, g in df.groupby("CNPJ_UF_MES"): 
      #...testa se já existi a chave n em result...
      if n in result:
        result[n] += g.values.tolist()              #...se sim, concatena a lista em result com a lista obtida do grupo. 
      else:
        result[n] = g.values.tolist()               #...se não, cria a chave em result e adiciona a lista obtida do grupo como valor.

pprint.pprint(list(result.values()), compact=True)  #Imprime a lista com os valores de result.
[[
  ['1-MG', '1_2_3', '1110', '200', 'xxr', 'wer', 'key_1', '01032021','31032021', 'wer', ''],
  ['1-MG', '2_3_4', '1110', '200', 'xxv', 'cad', 'key_2', '01032021','31032021', 'cad', ''],
  ['1-MG', '1_2_3', '1110', 'C-200', '07032021', '1', '2', '123', ''],
  ['1-MG', '2_3_4', '1110', 'C-200', '07032021', '2', '2', '123', ''],
  ['1-MG', '3_4_5', '1110', 'C-200', '07032021', '3', '2', '123', ''],
  ['1-MG', '1_2_3', '1110', 'C-200', '08032021', '1', '2', '123', '']
 ],
 [
  ['2-MG', '3_4_5', '1110', '200', 'xxw', 'sder', 'key_3', '01032021', '31032021', 'sder', ''],
  ['2-MG', '2_3_4', '1110', 'C-200', '08032021', '2', '2', '123', ''],
  ['2-MG', '3_4_5', '1110', 'C-200', '08032021', '3', '2', '123', '']
 ]]

Test the code on Repl.it

Browser other questions tagged

You are not signed in. Login or sign up in order to post.