Very slow python code?? Access to Access database with pyodbc

Asked

Viewed 137 times

0

I need to get some data from a huge access database (two, actually, depending on the input I have). I have a function that calls this bank by Chunks (if I try a fetchall gives memory problem). This function is called a few times by my main function, until I can get all the data I want. I will post the 2 codes here... I need to be able to speed up the execution time...depending on the input that runs in about 8 minutes! is a lot.

def chama_chunk():
    global data
    i=0
    names = [x[0] for x in cursor.description]  #pega os nomes das colunas

    while True: 
        rows = cursor.fetchmany(200) #pega 200 linhas
        if len(rows) == 0: #parar se n tiver nd
            break 
        else: 
            if (i == 0): #se for a 1a linha, cria o dataframe com 200 linhas e colunas conforme em names
                data = pd.DataFrame(np.array(rows),columns=names) 


            else:
                data_append = [{names[i]:x[i] for i in range(len(names))} for x in rows] 
                data_append = pd.DataFrame ( data_append, columns = names) 
                data = pd.concat(objs = [data,data_append], axis = 0) 

            i = i+1 
        if i == 200:
            break

the main function is this (columns being deleted are not required)

def faztudo(lista,ponto1,ponto2):
    global ativo
    global reativo
    lista1=lista
    chama_chunk()
    del data['CampoB']
    del data['CampoC']
    del data['Dia']
    del data ['Unidade']
    del data['Ano']
    del data['Consumo']
    del data['Ponto_maxima']
    del data['Demanda_maxima']
    del data['Fator_carga']
    gc.enable()
    ativo=pd.DataFrame(columns=data.columns)
    reativo=pd.DataFrame(columns=data.columns)
    while (data.Mes == mes).any():
        if periodo == "DOM":
            #print("entrou dom")
            data.query('Mes==@mes',inplace=True)
            data.query('Codigo in @lista1',inplace=True)
            data.query('Dia_da_semana=="DM"',inplace=True)
            ativo=ativo.append(data.query('Grandeza==1'))
            reativo=reativo.append(data.query('Grandeza==2'))


        elif periodo == "SAB":
            #print("entrou sab")
            data.query('Mes==@mes',inplace=True)
            data.query('Codigo in @lista1',inplace=True)
            data.query('Dia_da_semana=="SB"',inplace=True)
            ativo=ativo.append(data.query('Grandeza==1'))
            reativo=reativo.append(data.query('Grandeza==2'))

        else:
            #print("entrou resto")
            data.query('Mes==@mes',inplace=True)
            data.query('Codigo in @lista1',inplace=True)
            data.query('Dia_da_semana==@diasuteis',inplace=True)
            ativo=ativo.append(data.query('Grandeza==1'))
            reativo=reativo.append(data.query('Grandeza==2'))


        chama_chunk()
        del data['CampoB']
        del data['CampoC']
        del data['Dia']
        del data ['Unidade']
        del data['Ano']
        del data['Consumo']
        del data['Ponto_maxima']
        del data['Demanda_maxima']
        del data['Fator_carga']

and my query is made by the start function:

def inicia (dbpath,mes):
    #dbpath="W:\\CuCa-2018-ZFA.mdb"
    driver='{Microsoft Access Driver (*.mdb)}'
    con = pyodbc.connect("DRIVER={}; DBQ={}".format(driver,dbpath))
    query="SELECT * from Curvas_de_carga"
    global cursor
    cursor = con.cursor()
    cursor.execute(query)

some other pieces of code were not cited, but I believe that the performance problem is practically in these 3 functions.

I’m a beginner in database access :/ And the option to exchange database does not exist here, because the database is not mine, I am accessing a file of another person.

2 answers

0

Test the following items: avoid inserting this step seems that you do nothing with these columns (does not compare or thing like)

del data['CampoB']
del data['CampoC']
del data['Dia']
del data ['Unidade']
del data['Ano']
del data['Consumo']
del data['Ponto_maxima']
del data['Demanda_maxima']
del data['Fator_carga']

See if deleting these columns from the query (before bringing them to the dataframe this can speed up your query because you are bringing less data and probably there you can increase the amount of records (pass 200 per loop)

0

It is very likely that the time it takes is of the querry

query="SELECT * from Curvas_de_carga"

Try to further restrict the search for querry using "Where", for example using a date to limit the search (between day x a y, or x to date)

But try doing it just to test:

query="SELECT * from Curvas_de_carga" LIMIT 100;

You’ll get "few" results but if it’s faster it’s because you’re doing a really big search.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.