-3
Good afternoon to you all! I’m starting some Python studies and to put into practice I’m trying to optimize some routines for my work.
In the sector where we work we have the habit of crossing a lot of data, mainly clients' phones, sometimes we use Excel to make a PROCX or V, but I decided to give an optimized in the routines.
In case I’m using the pandas library and I have two identical columns.
DF1 = NUMBERS JA USED DF2 = NUMBERS I WILL USE
My goal is to make a cross generating a third Data frame, where the numbers I will use cross with the already used and return only the values that do not repeat.
This third DF would be my final file where it would have only what would not have been dialed.
import pandas as pd
###LENDO ARQUIVO NOVO
Novo_Mailing_df = pd.read_csv('Novo_Mailing.csv')
Display('Novo_mailing_df)
0 21900000001
1 21900000002
2 21900000003
3 21900000004
4 21900000005
5 21900000006
6 21900000007
7 21900000008
8 21900000009
9 21900000010
10 21900000011
11 21900000012
12 21900000013
13 21900000014
14 21900000015
15 21900000016
16 21900000017
17 21900000018
18 21900000019
19 21900000020
20 2122771300
21 2122771301
22 2122771302
23 2122771303
24 2122771304
25 2122771305
26 2122771306
27 2122771307
28 2122771308
29 2122771309
###UPANDO JA TRABALHADOS
Discados_Callflex_df = pd.read_csv('Discados_Callflex.csv')
display(Discados_Callflex_df)
0 21900000001
1 21900000002
2 21900000003
3 21900000004
4 21900000005
5 21900000006
6 21900000007
7 21900000008
8 21900000009
9 21900000010
10 21900000011
11 21900000012
12 21900000013
13 21900000014
14 21900000015
15 21900000016
16 21900000017
17 21900000018
18 21900000019
19 21900000020
###USANDO OS JOINS PARA RETORNAR A DIFERENÇA
cruzamento_df = pd.merge(Novo_Mailing_df, Discados_Callflex_df, how='outer', on='telefone', indicator=True)
display(cruzamento_df)
0 21900000001 both
1 21900000002 both
2 21900000003 both
3 21900000004 both
4 21900000005 both
5 21900000006 both
6 21900000007 both
7 21900000008 both
8 21900000009 both
9 21900000010 both
10 21900000011 both
11 21900000012 both
12 21900000013 both
13 21900000014 both
14 21900000015 both
15 21900000016 both
16 21900000017 both
17 21900000018 both
18 21900000019 both
19 21900000020 both
20 2122771300 left_only
21 2122771301 left_only
22 2122771302 left_only
23 2122771303 left_only
24 2122771304 left_only
25 2122771305 left_only
26 2122771306 left_only
27 2122771307 left_only
28 2122771308 left_only
29 2122771309 left_only
That’s what I’ve done for now, but I can only use ways to return what you have in common, but my goal is to return what you have that’s different between them.
My ultimate goal is to know if what I want to work on today has been worked on some other day, and in the new DF return only what I didn’t work on.
Thanks for the feedback! I will try to apply your explanation to my project as soon as I have some time at work. I will also research on isin to learn more about it.
– Diogenes Costa
Thank you for your help! Your solution really worked , and while researching what you told me I learned some other things about PANDAS.
– Diogenes Costa
Speaks @Diogenescosta, all good? what a great that I could contribute and very good to know that you learned more about pandas. Great hug!
– lmonferrari