Check if records exist with Python

Asked

Viewed 2,084 times

-1

I’m going from SQL to python and I’m still a little confused. It is the following here in my code I want to know if the values of the select of the result variable exist in the select of the sale variable, and if they exist do nothing, and if they do not exist load the data. In sql I used If exists, I don’t know what is the equivalent in python. Thanks in advance.

Example I am bringing record of an sql table and loading in another Mysql table, ai in case you already have these records in mysql table I want to print on screen: no new record, and in case you have record that are not yet in Mysql table want to upload these records.

Note: I play the data in the dataframe to then load in mysql

import pymysql.Cursors import pyodbc import pandas as pd from sqlalchemy import create_engine

connection = pyodbc.connect("DSN=SQLServer")  #autocommit=True

try:
    with connection.cursor() as cursor:
        result = "SELECT * FROM dw.dbo.vW_vendas"
        df = pd.read_sql_query("SELECT * FROM dw.dbo.vW_Vendas",connection,index_col=None,coerce_float=True, parse_dates= 'DataBaseContrato')
        cursor.execute(result)
        table = cursor.fetchall()
        print(table)             

finally:
    connection.close()

#Conexão Mysql
cnx = create_engine('mysql+pymysql://teste:teste@teste/dw')
cnxmysql = pymysql.connect(host='teste',
                             user='teste',
                             password='teste',
                             db='dw')
try:
    with cnxmysql.cursor() as cursor2:
        venda = "SELECT * FROM ft_venda_teste"
        cursor2.execute(venda)
        venda = cursor2.fetchall()
        print(venda)
finally:
    cnxmysql.close()

df.to_sql(con=cnx, name= 'ft_venda_teste',if_exists= 'replace', index= False)
print('dados Carregados')
  • Got a little confused, you want to check if records exist in the table at the time of select?

  • That’s right, Example I’m bringing record of an sql table and loading into another Mysql table, ai in case you already have these record in the mysql table I want you to print on the screen: no new record, and in case you have record that are not yet in the Mysql table I want you to load these records.

3 answers

3


I couldn’t quite understand the requirement, but it seems that Voce wants to synchronize sales of a DW in SQL Server with another database in Mysql.

I could not test the solution below, because I do not have an environment configured to run SQL Server and Mysql, but try to understand the logic below:

import pymysql.cursors
import pyodbc
import pandas as pd
from sqlalchemy import create_engine


def get_vendas_sqlserver():
    connection = pyodbc.connect("DSN=SQLServer")  #autocommit=True

    try:
        # Vc nao precisa de um cursor e o read_sql_query do pandas aqui
        # O metodo do pandas ja vai retornar o resultado do seu select
        df = pd.read_sql_query("SELECT * FROM dw.dbo.vW_Vendas",connection,index_col=None,coerce_float=True, parse_dates= 'DataBaseContrato')

        return df
    finally:
        connection.close()

def get_vendas_mysql()
    #Conexão Mysql
    cnxmysql = pymysql.connect(host='teste',
                                 user='teste',
                                 password='teste',
                                 db='dw')
    try:
        # mesmo acima
        df = pd.read_sql_query("SELECT * FROM ft_venda_teste", cnxmysql, index_col=None, coerce_float=True,
                               parse_dates='DataBaseContrato')

        return df
    finally:
        cnxmysql.close()

def merge_vendas():
    df1 = get_vendas_sqlserver()
    df2 = get_vendas_mysql()
    #cria um dataframe vazio para guardar o resultado
    df_result = pd.DataFrame()

    # o metodo iterrows retorna um tuple com o indice (numero da linha) e Pd.Series com o "registro" em si
    for index, row in df1.iterrows():
        # verifica se este registro existe no DF do MySQL
        if row["vendaid"] in df2["vendaid"]:
            print("Venda {0} encontrada no SQL Server".format(row["vendaid"]))
        else:
            print("Venda {0} nao encontrada no SQL Server".format(row["vendaid"]))
            # adiciona o registro num novo DF que vai ser usado para gravar no banco adiante
            df_result.append(row)

    write_results(df_result)

def write_results(df_result):
    cnx = create_engine('mysql+pymysql://teste:teste@teste/dw')

    df_result.to_sql(con=cnx, name='ft_venda_teste', if_exists='append', index=False)
    print('dados Carregados')

Maybe this is not the most "elegant" way to achieve what you need, but I wanted to take advantage of your reasoning to get you to follow my logic.

I hope this is a starting point!

  • thus it returned me an error '>' not supported between instances of 'pyodbc. Cursor' and 'int

  • Your code is a bit confusing, can create some three functions: getVendasSQL and getVendasMySQL (names are only suggested, I do not understand the business). After that, I think you can have them both select the data and a third function compare to create the diff. I noticed that this does not produce the result you expect df.to_sql(con=cnx, name= 'ft_venda_teste',if_exists= 'replace', index= False). See that the if_exists='replace' in fact drop the table and recreate from scratch. Append maybe?

  • I’m starting now on python so I’m still catching

  • 1

    I’ll rephrase my answer to something closer than you need.

  • 1

    Thank you so much for the Iann force

  • Thank you Iann your reply helped a lot

Show 1 more comment

0

First you need to identify that the result of the variable result is actually in the variable table, as you yourself use in print() of the first catch.

Knowing this, you can create a set() for each list of results, this will eliminate the repeated elements of each list of results and allow interaction between collections (for example, creating a new list with the result):

x = set(table)
print(x)
>>> {resultado1, resultado2, resultado3, resultado4}

y = set(venda)
print(y)
>>> {resultado1, resultado2, resultado3, resultado5}

z = list(x - y)
print(z)
>>> [resultado4]

However, if you need repeated values, you can use a for loop, with a if not in, identifying whether the element nay is contained in a list:

z = list()
for i in venda:
    if i not in table
    z.append(i)

And then the results will be on the z list.

Edit:

As mentioned by @Ytalo-Matos-bandeira-da-silva, to use lists with set(), it is necessary to transform them first into tuples:

x = set(tuple(table))
print(x)
>>> {resultado1, resultado2, resultado3, resultado4}

y = set(tuple(venda))
print(y)
>>> {resultado1, resultado2, resultado3, resultado5}

z = list(x - y)
print(z)
>>> [resultado4]
  • When I create the set it returns me an unhashable type error: 'pyodbc. Row'

  • This is because the set does not allow you to create collection with changeable as list() or Dict(), but with tuple() is possible, so first turn your results into tuples, I will change the answer for further clarification.

  • Thanks for the help, I’ll try now

  • Even transforming the list occurred the same error

  • I know you already solved your problem but just for registration questions for future users, could you comment with the complete error? Or at least a little more complete to identify the reason?

  • yes, even when I turn into tuple it keeps returning me the unhashable type error: 'pyodbc.Row', and does not let me create the collection

Show 1 more comment

-2

Complementing the response of Iann:

rows = cursor.execute(result)
if rows > 0:
    table = cursor.fetchall()
    // Eu colocaria o resultado no df, aqui 
else:
     // Trate aqui o "vazio"
  • I would like to understand Stopt, you try to supplement an answer and you get a downvote. I think that for this reason is falling too much the quality here and scaring so many users. Unfortunate.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.