PANDAS - PYTHON - FIND DIFFERENT VALUES

Question

PANDAS - PYTHON - FIND DIFFERENT VALUES

Asked 4 years ago

Viewed 44 times

-3

Good afternoon to you all! I’m starting some Python studies and to put into practice I’m trying to optimize some routines for my work.

In the sector where we work we have the habit of crossing a lot of data, mainly clients' phones, sometimes we use Excel to make a PROCX or V, but I decided to give an optimized in the routines.

In case I’m using the pandas library and I have two identical columns.

DF1 = NUMBERS JA USED DF2 = NUMBERS I WILL USE

My goal is to make a cross generating a third Data frame, where the numbers I will use cross with the already used and return only the values that do not repeat.

This third DF would be my final file where it would have only what would not have been dialed.

import pandas as pd

###LENDO ARQUIVO NOVO
Novo_Mailing_df = pd.read_csv('Novo_Mailing.csv')

Display('Novo_mailing_df)

0   21900000001
1   21900000002
2   21900000003
3   21900000004
4   21900000005
5   21900000006
6   21900000007
7   21900000008
8   21900000009
9   21900000010
10  21900000011
11  21900000012
12  21900000013
13  21900000014
14  21900000015
15  21900000016
16  21900000017
17  21900000018
18  21900000019
19  21900000020
20  2122771300
21  2122771301
22  2122771302
23  2122771303
24  2122771304
25  2122771305
26  2122771306
27  2122771307
28  2122771308
29  2122771309

###UPANDO JA TRABALHADOS
Discados_Callflex_df = pd.read_csv('Discados_Callflex.csv')

display(Discados_Callflex_df)

0   21900000001
1   21900000002
2   21900000003
3   21900000004
4   21900000005
5   21900000006
6   21900000007
7   21900000008
8   21900000009
9   21900000010
10  21900000011
11  21900000012
12  21900000013
13  21900000014
14  21900000015
15  21900000016
16  21900000017
17  21900000018
18  21900000019
19  21900000020

###USANDO OS JOINS PARA RETORNAR A DIFERENÇA

cruzamento_df = pd.merge(Novo_Mailing_df, Discados_Callflex_df, how='outer', on='telefone', indicator=True)

display(cruzamento_df)

0   21900000001 both
1   21900000002 both
2   21900000003 both
3   21900000004 both
4   21900000005 both
5   21900000006 both
6   21900000007 both
7   21900000008 both
8   21900000009 both
9   21900000010 both
10  21900000011 both
11  21900000012 both
12  21900000013 both
13  21900000014 both
14  21900000015 both
15  21900000016 both
16  21900000017 both
17  21900000018 both
18  21900000019 both
19  21900000020 both
20  2122771300  left_only
21  2122771301  left_only
22  2122771302  left_only
23  2122771303  left_only
24  2122771304  left_only
25  2122771305  left_only
26  2122771306  left_only
27  2122771307  left_only
28  2122771308  left_only
29  2122771309  left_only

That’s what I’ve done for now, but I can only use ways to return what you have in common, but my goal is to return what you have that’s different between them.

My ultimate goal is to know if what I want to work on today has been worked on some other day, and in the new DF return only what I didn’t work on.

1 answer

Browser other questions tagged python pandas join

You are not signed in. Login or sign up in order to post.

by lmonferrari • **3,550** points · Answer 1 · 2021-07-06T15:29:57+00:00

You can use the isin of pandas

Importing the pandas

import pandas as pd

Creating the dataframes:

Novo_Mailing_df = pd.read_csv('../DADOS/Novo_Mailing.csv', sep = ';', names=['Coluna1'])
Discados_Callflex_df = pd.read_csv('../DADOS/Discados_Callflex.csv', sep = ';', names=['Coluna1'])

Here is the part where we check if values of one dataframe are present in the other:

Novo_Mailing_df[~Novo_Mailing_df['Coluna1'].isin(Discados_Callflex_df['Coluna1'])]

Exit:

	Column1
20	2122771300
21	2122771301
22	2122771302
23	2122771303
24	2122771304
25	2122771305
26	2122771306
27	2122771307
28	2122771308
29	2122771309