In pandas, how to merge two dataframes, but only one section of two columns?

Asked

Viewed 3,305 times

0

In Python 3 and pandas I have two dataframes

eleitos_d_doadores_d.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 16090 entries, 16 to 26412
Data columns (total 23 columns):
uf_x                           16090 non-null object
partido_eleicao_x              16090 non-null object
cargo_x                        16090 non-null object
nome_completo_x                16090 non-null object
cpf                            16090 non-null object
cpf_cnpj_doador                16090 non-null object
nome_doador                    16090 non-null object
valor                          16090 non-null object
tipo_receita                   16090 non-null object
fonte_recurso                  16090 non-null object
especie_recurso                16090 non-null object
cpf_cnpj_doador_originario     16090 non-null object
nome_doador_originario         16090 non-null object
tipo_doador_originario         16090 non-null object
Unnamed: 0                     16090 non-null int64
uf_y                           16090 non-null object
cargo_y                        16090 non-null object
nome_completo_y                16090 non-null object
nome_urna                      16090 non-null object
partido_eleicao_y              16090 non-null object
situacao                       16090 non-null object
dtypes: int64(1), object(22)
memory usage: 2.9+ MB

segura.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35581 entries, 0 to 35580
Data columns (total 3 columns):
cnpj            35581 non-null object
nome_empresa    35581 non-null object
estado          35581 non-null object
dtypes: object(3)
memory usage: 834.0+ KB

I need a merge of this type to eliminate duplications

segura_doadores_diretos = pd.merge(eleitos_d_doadores_d, segura.drop_duplicates('cnpj'), left_on='cpf_cnpj_doador', right_on='cnpj')

The columns "cnpj" and "cpf_cnpj_donor" are codes. But I need to compare only the first eight digits of each column. Please, is it possible to do this? Without having to create new columns with snippets

1 answer

2


Looking at the Pandas documentation I do not believe it is possible without creating a temporary column because the merge method only takes the name of the label as key.

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html

I think you have to create the temporary table anyway. Then just delete.

segura['CNPJ_8_digitos'] = segura.cnpj.str[:8]
eleitos_d_doadores_d['cpf_cnpj_doador_8_digitos'] = segura.cpf_cnpj_doador.str[:8]

segura_doadores_diretos = pd.merge(eleitos_d_doadores_d, segura.drop_duplicates('cnpj'), left_on='cpf_cnpj_doador_8_digitos', right_on='CNPJ_8_digitos')

Browser other questions tagged

You are not signed in. Login or sign up in order to post.