Compare fields in two datasets

Asked

Viewed 578 times

2

Considering two sets of read data from type files *.CSV with the Pandas. Each set has only one field CPF Favorecido,where there are millions of records. Each data set is equivalent to one month. I need to figure out which records (CPF numbers) are in one dataset but not in another.

The code is like this:

atual = pandas.read_csv(arquivo_atual, header=0, delimiter='\t', quotechar='"', usecols=['CPF Favorecido'])  
seguinte = pandas.read_csv(arquivo_seguinte, header=0, delimiter='\t', quotechar='"', usecols=['CPF Favorecido'])

I just need the count of the numbers that appear in the file atual but they’re not in the archive seguinte and vice versa.

Is there a function that counts these records? Or do I need to build one loop and compare one to one?

1 answer

1


The way I know it, using pandas, would look like this:

atual.where(~atual['CPF Favorecido'].isin(seguinte['CPF Favorecido'])).count()
seguinte.where(~seguinte['CPF Favorecido'].isin(atual['CPF Favorecido'])).count()
  • 1

    Dude, I wasn’t hitting the syntax of this command! I tried using "isin" several times and gave error. Muto grateful, problem solved!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.