Compare fields in two datasets

Question

Compare fields in two datasets

Asked 9 years, 1 month ago

Viewed 578 times

2

Considering two sets of read data from type files *.CSV with the Pandas. Each set has only one field CPF Favorecido,where there are millions of records. Each data set is equivalent to one month. I need to figure out which records (CPF numbers) are in one dataset but not in another.

The code is like this:

atual = pandas.read_csv(arquivo_atual, header=0, delimiter='\t', quotechar='"', usecols=['CPF Favorecido'])  
seguinte = pandas.read_csv(arquivo_seguinte, header=0, delimiter='\t', quotechar='"', usecols=['CPF Favorecido'])

I just need the count of the numbers that appear in the file atual but they’re not in the archive seguinte and vice versa.

Is there a function that counts these records? Or do I need to build one loop and compare one to one?

1 answer

Browser other questions tagged python csv pandas

You are not signed in. Login or sign up in order to post.

by Fabiano • **555** points · Answer 1 · 2016-05-24T19:46:18+00:00

The way I know it, using pandas, would look like this:

atual.where(~atual['CPF Favorecido'].isin(seguinte['CPF Favorecido'])).count()
seguinte.where(~seguinte['CPF Favorecido'].isin(atual['CPF Favorecido'])).count()