0
Hello,
I am performing some data processing and I need my algorithm to perform the comparison between two lines and remove the worst one, according to some conditions.
Each time the algorithm performs the removal, it must recalculate the dataframe, as one of the conditions depends on the relation of the bottom line to the top.
My dataframe has the following format
df = n° de fios 1 diametro 1 n° de fios 2 diametro 2 seção total relação
n11 d11 n21 d21 st1 st2/st1
n12 d12 n22 d22 st2 st3/st2
This goes on for several lines.
What my algorithm does is:
- Reads the dataframe excel, which has the first 5 columns
- Calculate the last column
- Add another column with a rank based on the first column (coming from a Dict)
- Run the lines through a loop
for n in range(0, len(df)-1)
- If the ratio is less than 1.03, you should remove one of the two lines
- There is a set of conditions that must be met for the line
n+1
is removed, otherwise removes the linen
(the conditions include therank
). - When a line is removed, reset the
index
, removes the columnrank
, adds the dataframe à um excel and applies the loop in the new excel.
My code is this::
class Rank_and_drop(object):
def __init__(self): # define os dicts
#exemplo
self.dict_mc_112_2p_cu = {'0.71': 0, '0.75': 1, '0.67': 2, '0.8': 3, '0.85': 4, '0.63': 5, '0.45': 6, '0.475': 7, '0.5': 8, '0.53': 9, '0.56': 10, '0.6': 11}
def recalc_df(self, file, sheet):
# lê a tabela
df_final = pd.read_excel(file, sheet_name=sheet)
# converte os numeros para strings para poder usar o self.dict
df_final = df_final.astype(str)
df_final = df_final.drop(columns=['rsc'])
df_final = df_final.sort_values(by='sc')
df_final = df_final.reset_index(drop=True)
aux = df_final.loc[:, 'sc']
aux = aux.reset_index(drop=True)
aux2 = aux.drop(0)
aux2 = aux2.reset_index(drop=True)
aux2 = aux2.append(aux2.tail(1), ignore_index=True)
for x in range(len(aux)):
aux[x] = (round(float(aux2[x])/float(aux[x]), 3))
aux = aux.rename('rsc')
df_final['rsc'] = aux
df_final = df_final.astype(str)
df_final['rnk'] = df_final['d1'].map(dict_)
return df_final
def loop(self, file, sheet):
df = self.recalc_df(file, sheet)
for n in range(0,len(df)-1):
c_0 = float(df.at[n, 'rsc']) <= 1.03
c_1 = (df.at[n, 'n2'] == '0') and (df.at[n+1, 'n2'] == '0')
c_1_1 = df.at[n+1, 'rnk'] >= df.loc[n, 'rnk']
c_1_2 = float(df.at[n, 'n1']) < float(df.at[n+1, 'n1'])
c_2 = (df.at[n, 'n2'] == '0') and (df.at[n+1, 'n2'] != '0')
c_4 = (df.at[n, 'n2'] != '0') and (df.at[n+1, 'n2'] != '0')
c_4_1 = df.at[n+1, 'rnk'] == df.at[n, 'rnk']
c_4_1_1 = float(df.at[n+1, 'd2']) < float(df.at[n, 'd2'])
c_4_2 = df.at[n+1, 'rnk'] > df.at[n, 'rnk']
if c_0:
if (((c_1) and ((c_1_1) or (c_1_2))) or (c_2) or ((c_4) and ((((c_4_1) and (c_4_1_1)) or (c_4_2))))):
df = df.drop(n+1)
else:
df = df.drop(n)
df = df.reset_index(drop=True)
df = df.drop(columns=['rnk'])
df.to_excel('{0}.xlsx'.format(sheet), index=False, sheet_name=sheet)
self.loop('{0}.xlsx'.format(sheet), sheet)
else:
pass
n = n+1
df.to_excel('{0}.xlsx'.format(sheet), index=False, sheet_name=sheet)
return df
Now, the problem is that mine loop doesn’t work the way I expect...
It starts correctly, and to some extent even does what is expected, but from a certain point, the algorithm seems to give a bug, and removes several lines that it should not, and at the end returns a KeyError
.
Any idea what I can do to accomplish this task, more optimally and without these errors?
Edit:
I have several dataframes that I want to do this, of varying sizes, in the smallest of them (35 lines), the code correctly performs the function, however, returns the KeyError
. In another dataframe, 690 lines, the problem of removing multiple lines occurs.
Hi @Lucas, I suggest you "disassemble" a little this loop in a slightly smaller function, to make it easier to debug, usually when working with pandas I avoid doing loops, and when there is no way, I give preference to native iterators of the class iterrows or tuple
– LeandroHumb
I didn’t understand your idea, the loop is simple, but it has several conditions, I tried to use a while and solved my problem in parts... But the keyerror still occurs, I don’t know what else to do...
– Lucas
I think I understood your question correctly and published an answer. However, next time try to provide an example minimum, complete and verifiable, that facilitates the understanding of users who intend to help, or who have the same doubt as you :)
– danielbb