Comparing column values in dataframe pandas rows

Question

Comparing column values in dataframe pandas rows

Asked 6 years, 2 months ago

Viewed 4,919 times

1

Good afternoon. I am comparing two given frames, data frame 1 has an sql query and data frame 2 a mysql query I want to compare the two. When the DF 1 record is not in DF 2 I want to save the result in another empty data frame, then load the data of that empty data frame in my table in mysql, so I can automate a process.

DF1 = Function for connection to sql server database and dataframe 1

DF2 = Function for connection to Mysql database and dataframe 2

import pymysql.cursors
import pyodbc
import pandas as pd
from sqlalchemy import create_engine


# Função para trazer os dados do Sql e jogar em um dataframe
def get_vendas_sqlserver():
    #Conexão com SQL Server
    connection = pyodbc.connect("DSN=SQLServer")  #autocommit=True
    try:
        # Conteúdo do data frame 1
        df = pd.read_sql_query("SELECT * FROM dw.dbo.vW_Vendateste123",connection,index_col=None,coerce_float=True, parse_dates= 'DataBaseContrato') 
        return df      
    finally:
        connection.close()


# Função para trazer os dados do mysql e jogar em um dataframe
def get_vendas_mysql():
    #Conexão Mysql
    cnxmysql = pymysql.connect(host='teste',
                                 user='teste',
                                 password='teste',
                                 db='dw')
    try:
        # Conteúdo do data frame 2
        df = pd.read_sql_query("SELECT * FROM ft_venda_copy", cnxmysql, index_col=None, coerce_float=True,
                               parse_dates='DataBaseContrato')
        return df
    finally:
        cnxmysql.close()

def compara_valores():
    cnx = create_engine('mysql+pymysql://teste:teste@teste/dw')
    df1 = get_vendas_sqlserver()
    df2 = get_vendas_mysql()
    #criando um dataframe vazio para guardar o resultado
    df_result = pd.DataFrame()
    print('df1: ',df1,' ', 'df2: ',df2, sep='\n')



    # o metodo iterrows retorna um tuple com o indice 
    for index, row in df1.iterrows():
    # verifica se este registro existe no DF do MySQL
        if row["IdContrato"] in df2["IdContrato"]:
            print("Contrato {0} encontrada no Mysql".format(row["Idcontrato"]))       
        else:
            print("Contrato {0} nao encontrada no Mysql".format(row["IdContrato"]))

        df_result.to_sql(con=cnx, name='ft_venda_copy', if_exists='append', index=False)

When I check the Result data frame it returns me empty

# adiciona o registro em um novo DF que vai ser usado para gravar no banco
df_result.append(row)
print(df_result)


df1:
      IdUnidade  IdContrato  IdProspect TipoContrato  ... 
0           276      9607.0         NaN          PCV  ...                 
1           328      8391.0         NaN          PCV  ...            
2           362     10233.0         NaN          PCV  ...            
3           309     10548.0         NaN          PCV  ...            
4           237      8849.0         NaN          PCV  ...            
5           308      8116.0         NaN          PCV  ...            0 

[6355 rows x 34 columns]


df2:
      IdUnidade  IdContrato  IdProspect TipoContrato  ... 
0           276      9607.0         NaN          PCV  ...                 
1           328      8391.0         NaN          PCV  ...            
2           362     10233.0         NaN          PCV  ...            
3           309     10548.0         NaN          PCV  ...            
4           237      8849.0         NaN          PCV  ...            
5           308      8116.0         NaN          PCV  ...            0 

[6355 rows x 34 columns]

You edited the question, added the command I suggested in the comments of my answer, but note that I commented "Run and play the result in your question." , where is the result?

– Sidon

2019/03/07 at 16:38

1 answer

Browser other questions tagged mysql sql python

You are not signed in. Login or sign up in order to post.

by Sidon • **6,563** points · Answer 1 · 2019-03-06T20:24:35+00:00

The ideal would be that you show the content of the dataframes, it became difficult to understand the context, so I will try to give an example based on what I understood of your question, I will create 1 dataframe df1 and then copy it for a second df2 and then change the data of the first column of the second row of the df1, i.e., the first dataframe df1 will contain a line (the second) that will not be contained in the second df2, then 'browse' on the lines of df1 checking that the value of the first column of each row is contained in the same column of df2, if not, save that(s) line(s) in a list and finally the cover for a dataframe

After 'Teracentos' comments and edits on the question, I edited the example to adapt to the context.

import pandas as pd

a1 = [['276', '9607.0','NaN','PCV'], 
      ['328', '8391.0','NaN','PCV'], 
      ['362', '10233.0','NaN','PCV']]

df1 = pd.DataFrame(a1, columns=['IdUnidade','IdContrato','IdProspect','TipoContrato'])
df2 = df1.copy()

# Incluindo uma nova linha em df1
df1.loc[len(df1)] = ['999', '9999.9','999','PCV']

# Aturalizando df2 em função de df1...
old_df2 = df2.copy()
for i, row in df1.iterrows():
    if row['IdContrato'] not in list(df2['IdContrato']):
        df2.loc[len(df2)] = list(row)

# Apresentando os resultados
print('','df1:', df1,'','df2 original:',old_df2,'','Novo df2:',df2,sep='\n')

Exit:

df1:
  IdUnidade IdContrato IdProspect TipoContrato
0       276     9607.0        NaN          PCV
1       328     8391.0        NaN          PCV
2       362    10233.0        NaN          PCV
3       999     9999.9        999          PCV

df2 original:
  IdUnidade IdContrato IdProspect TipoContrato
0       276     9607.0        NaN          PCV
1       328     8391.0        NaN          PCV
2       362    10233.0        NaN          PCV

Novo df2:
  IdUnidade IdContrato IdProspect TipoContrato
0       276     9607.0        NaN          PCV
1       328     8391.0        NaN          PCV
2       362    10233.0        NaN          PCV
3       999     9999.9        999          PCV

See working on repl it..