3
I’m having a certain performance difficulty in the code because it’s taking too long to run.
I have the following Dataframe. (Example because what I’m wearing is much bigger)
orders = {'Código': [600, 600, 601, 602],
'Num. Pedido': [1000, 1000, 1002, 1003],
'Data Pedido': ['10/01', '10/01', '08/09', 12/01],
'Sabor': ['Calabresa', 'Mussarela', 'Pepperoni', 'Portuguesa'],
'Quantidade': [1, 1, 1, 1],
'Metade': [1, 2, 1, 1],
'Meia': ['Sim', 'Sim', 'Não', 'Não']
'Preço': [40.0, 32.0, 45.0, 35.0]}
df = pd.DataFrame(data=orders)
What happens is, the order 1000 is practically equal to the order 1000, what changes is the Half Taste and the Price. What I want to do is put the Flavor of a half next to the second half for example, Calabresa+Mozzarella. And divide the price by 2 because the price is whole. I did it as follows, but it is running for more than 10 hours.
cache_metade = {}
df_final = pd.DataFrame()
lista = []
for index, row in df.iterrows():
if index not in cache_metade:
df_metade = df.loc[meioameio['Num. Pedido'] == row['Num. Pedido']]
df_metade = df_metade.loc[df_metade['Data Pedido'] == row['Data Pedido']]
df_metade = df_metade.loc[df_metade['Código'] == row['Código']]
df_metade = df_metade.loc[df_metade['Metade'] != row['Metade']]
df_metade = df_metade[~df_metade.index.isin(cache_metade)]
if len(df_metade.index) > 0:
metade_index = df_metade.index[0]
metade = df_metade.iloc[0]
if row['Sabores'] and metade['Sabores'] is not None:
metade['Sabor'] = metade['Sabor'] + "+" + row['Sabor']
metade['Preço'] = (metade['Preço'] / 2) + (row['Preço'] / 2)
lista.append(row['Código'])
df_final = df_final.append(metade)
cache_metade[index] = True
cache_metade[metade_index] = True
drop = df[df['Código'].isin(lista)].index
df_pedidos = df.drop(drop)
df_pedidos = pd.concat([df_pedidos, df_final])
Is there another way to do this? A way to be more efficient. Remember, the database I have is much larger than the one I exemplified.
Is it a real system or is it an exercise?
– Augusto Vasques
It’s a project I’m creating
– Igor Capão
If it is commercial, do not use the dataframe in the service, it is better to take advantage of data analysis and the way records are fragmented is ineffective. Already know Sqlite? python has a module dedicated to this DB. See here
– Augusto Vasques
And what would the column "Half"?
– Miguel
You can put a small example df of what the final result would look like
– Miguel
@Miguel Orders = {'Code': [600, 601, 602], 'Num. Order': [1000, 1002, 1003], 'Date Request': ['10/01', '08/09', '12/01'], 'Taste': ['Calabresa+Mussrela', 'Pepperoni', 'Portuguese'], 'Quantity': [1, 1], 'Half': ['1+2', '2', '1', '1'], 'Half': ['Yes', 'No', 'No'], 'Price': [36.0, 45.0, 35.0]}
– Igor Capão