0
I need to get some data from a huge access database (two, actually, depending on the input I have). I have a function that calls this bank by Chunks (if I try a fetchall gives memory problem). This function is called a few times by my main function, until I can get all the data I want. I will post the 2 codes here... I need to be able to speed up the execution time...depending on the input that runs in about 8 minutes! is a lot.
def chama_chunk():
global data
i=0
names = [x[0] for x in cursor.description] #pega os nomes das colunas
while True:
rows = cursor.fetchmany(200) #pega 200 linhas
if len(rows) == 0: #parar se n tiver nd
break
else:
if (i == 0): #se for a 1a linha, cria o dataframe com 200 linhas e colunas conforme em names
data = pd.DataFrame(np.array(rows),columns=names)
else:
data_append = [{names[i]:x[i] for i in range(len(names))} for x in rows]
data_append = pd.DataFrame ( data_append, columns = names)
data = pd.concat(objs = [data,data_append], axis = 0)
i = i+1
if i == 200:
break
the main function is this (columns being deleted are not required)
def faztudo(lista,ponto1,ponto2):
global ativo
global reativo
lista1=lista
chama_chunk()
del data['CampoB']
del data['CampoC']
del data['Dia']
del data ['Unidade']
del data['Ano']
del data['Consumo']
del data['Ponto_maxima']
del data['Demanda_maxima']
del data['Fator_carga']
gc.enable()
ativo=pd.DataFrame(columns=data.columns)
reativo=pd.DataFrame(columns=data.columns)
while (data.Mes == mes).any():
if periodo == "DOM":
#print("entrou dom")
data.query('Mes==@mes',inplace=True)
data.query('Codigo in @lista1',inplace=True)
data.query('Dia_da_semana=="DM"',inplace=True)
ativo=ativo.append(data.query('Grandeza==1'))
reativo=reativo.append(data.query('Grandeza==2'))
elif periodo == "SAB":
#print("entrou sab")
data.query('Mes==@mes',inplace=True)
data.query('Codigo in @lista1',inplace=True)
data.query('Dia_da_semana=="SB"',inplace=True)
ativo=ativo.append(data.query('Grandeza==1'))
reativo=reativo.append(data.query('Grandeza==2'))
else:
#print("entrou resto")
data.query('Mes==@mes',inplace=True)
data.query('Codigo in @lista1',inplace=True)
data.query('Dia_da_semana==@diasuteis',inplace=True)
ativo=ativo.append(data.query('Grandeza==1'))
reativo=reativo.append(data.query('Grandeza==2'))
chama_chunk()
del data['CampoB']
del data['CampoC']
del data['Dia']
del data ['Unidade']
del data['Ano']
del data['Consumo']
del data['Ponto_maxima']
del data['Demanda_maxima']
del data['Fator_carga']
and my query is made by the start function:
def inicia (dbpath,mes):
#dbpath="W:\\CuCa-2018-ZFA.mdb"
driver='{Microsoft Access Driver (*.mdb)}'
con = pyodbc.connect("DRIVER={}; DBQ={}".format(driver,dbpath))
query="SELECT * from Curvas_de_carga"
global cursor
cursor = con.cursor()
cursor.execute(query)
some other pieces of code were not cited, but I believe that the performance problem is practically in these 3 functions.
I’m a beginner in database access :/ And the option to exchange database does not exist here, because the database is not mine, I am accessing a file of another person.