Comparing column values in dataframe pandas rows

Asked

Viewed 4,919 times

1

Good afternoon. I am comparing two given frames, data frame 1 has an sql query and data frame 2 a mysql query I want to compare the two. When the DF 1 record is not in DF 2 I want to save the result in another empty data frame, then load the data of that empty data frame in my table in mysql, so I can automate a process.

DF1 = Function for connection to sql server database and dataframe 1

DF2 = Function for connection to Mysql database and dataframe 2

import pymysql.cursors
import pyodbc
import pandas as pd
from sqlalchemy import create_engine


# Função para trazer os dados do Sql e jogar em um dataframe
def get_vendas_sqlserver():
    #Conexão com SQL Server
    connection = pyodbc.connect("DSN=SQLServer")  #autocommit=True
    try:
        # Conteúdo do data frame 1
        df = pd.read_sql_query("SELECT * FROM dw.dbo.vW_Vendateste123",connection,index_col=None,coerce_float=True, parse_dates= 'DataBaseContrato') 
        return df      
    finally:
        connection.close()


# Função para trazer os dados do mysql e jogar em um dataframe
def get_vendas_mysql():
    #Conexão Mysql
    cnxmysql = pymysql.connect(host='teste',
                                 user='teste',
                                 password='teste',
                                 db='dw')
    try:
        # Conteúdo do data frame 2
        df = pd.read_sql_query("SELECT * FROM ft_venda_copy", cnxmysql, index_col=None, coerce_float=True,
                               parse_dates='DataBaseContrato')
        return df
    finally:
        cnxmysql.close()

def compara_valores():
    cnx = create_engine('mysql+pymysql://teste:teste@teste/dw')
    df1 = get_vendas_sqlserver()
    df2 = get_vendas_mysql()
    #criando um dataframe vazio para guardar o resultado
    df_result = pd.DataFrame()
    print('df1: ',df1,' ', 'df2: ',df2, sep='\n')



    # o metodo iterrows retorna um tuple com o indice 
    for index, row in df1.iterrows():
    # verifica se este registro existe no DF do MySQL
        if row["IdContrato"] in df2["IdContrato"]:
            print("Contrato {0} encontrada no Mysql".format(row["Idcontrato"]))       
        else:
            print("Contrato {0} nao encontrada no Mysql".format(row["IdContrato"]))

        df_result.to_sql(con=cnx, name='ft_venda_copy', if_exists='append', index=False)

When I check the Result data frame it returns me empty

# adiciona o registro em um novo DF que vai ser usado para gravar no banco
df_result.append(row)
print(df_result)


df1:
      IdUnidade  IdContrato  IdProspect TipoContrato  ... 
0           276      9607.0         NaN          PCV  ...                 
1           328      8391.0         NaN          PCV  ...            
2           362     10233.0         NaN          PCV  ...            
3           309     10548.0         NaN          PCV  ...            
4           237      8849.0         NaN          PCV  ...            
5           308      8116.0         NaN          PCV  ...            0 

[6355 rows x 34 columns]


df2:
      IdUnidade  IdContrato  IdProspect TipoContrato  ... 
0           276      9607.0         NaN          PCV  ...                 
1           328      8391.0         NaN          PCV  ...            
2           362     10233.0         NaN          PCV  ...            
3           309     10548.0         NaN          PCV  ...            
4           237      8849.0         NaN          PCV  ...            
5           308      8116.0         NaN          PCV  ...            0 

[6355 rows x 34 columns]
  • You edited the question, added the command I suggested in the comments of my answer, but note that I commented "Run and play the result in your question." , where is the result?

1 answer

1


The ideal would be that you show the content of the dataframes, it became difficult to understand the context, so I will try to give an example based on what I understood of your question, I will create 1 dataframe df1 and then copy it for a second df2 and then change the data of the first column of the second row of the df1, i.e., the first dataframe df1 will contain a line (the second) that will not be contained in the second df2, then 'browse' on the lines of df1 checking that the value of the first column of each row is contained in the same column of df2, if not, save that(s) line(s) in a list and finally the cover for a dataframe

After 'Teracentos' comments and edits on the question, I edited the example to adapt to the context.

import pandas as pd

a1 = [['276', '9607.0','NaN','PCV'], 
      ['328', '8391.0','NaN','PCV'], 
      ['362', '10233.0','NaN','PCV']]

df1 = pd.DataFrame(a1, columns=['IdUnidade','IdContrato','IdProspect','TipoContrato'])
df2 = df1.copy()

# Incluindo uma nova linha em df1
df1.loc[len(df1)] = ['999', '9999.9','999','PCV']

# Aturalizando df2 em função de df1...
old_df2 = df2.copy()
for i, row in df1.iterrows():
    if row['IdContrato'] not in list(df2['IdContrato']):
        df2.loc[len(df2)] = list(row)

# Apresentando os resultados
print('','df1:', df1,'','df2 original:',old_df2,'','Novo df2:',df2,sep='\n')

Exit:

df1:
  IdUnidade IdContrato IdProspect TipoContrato
0       276     9607.0        NaN          PCV
1       328     8391.0        NaN          PCV
2       362    10233.0        NaN          PCV
3       999     9999.9        999          PCV

df2 original:
  IdUnidade IdContrato IdProspect TipoContrato
0       276     9607.0        NaN          PCV
1       328     8391.0        NaN          PCV
2       362    10233.0        NaN          PCV

Novo df2:
  IdUnidade IdContrato IdProspect TipoContrato
0       276     9607.0        NaN          PCV
1       328     8391.0        NaN          PCV
2       362    10233.0        NaN          PCV
3       999     9999.9        999          PCV

See working on repl it..

  • Sidon added the content of the data frame in case it would be two table one sql and another my sql, would apply in the same way as your example ??

  • If my example is according to your need, it will apply. Where have you added the 2 Dfs content? Besides the lack of this, it seems that your question this badly formatted or missing more things, the last block of code that you present, should not be inside the for index, row... of the previous code block?

  • Sidon, I reworked the question and added the missing parts to the code, and thank you for your help

  • 1

    Hasn’t introduced the dfs and the formwork still seems wrong. to present the dfs, do it like this: In def compara_valores(), create a line in that vacant space, in the middle of the code block, with the following command: print('df1:',df1,' ', 'df2:',df2, sep='\n'). Execute and play the result in your question.

  • I edited the question with what you asked

  • 1

    No, you did not do what I suggested, see in the comments of your question.

  • I put the summary result of the data frames note that the two have the same data, the intention is to load the data in df 2 when you have new data in df 1

  • 1

    Okay, I’m editing my answer to fit your context, after which I send you my account number so your boss can make the deposit, okay? :-)

  • Thank you, you helped me understand

Show 4 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.