Logic (if,Elif,Else) with python error

Question

Logic (if,Elif,Else) with python error

Asked 4 years, 4 months ago

Viewed 72 times

0

Code:

def get_status(df3):
    if (str(df3['Forecast RELEX']).lower() == 'nan' and str(df3['Forecast']).lower() != 'nan'):
        return 'Subir'
    elif df3['Forecast'] > df3['Forecast RELEX']:
        return 'Subir'
    elif df3['Forecast'] == df3['Forecast RELEX']:
        return 'Não Subir'
    else:
        return 'Não Subir'
# criar coluna nova para "status"
df3['STATUS'] = df3.apply(lambda row: get_status(row), axis = 1)
df3.head(200)

I have the code above, but is not respecting the rules of if, see the result:

   Product   Location   Start_Date  End_Date    Forecast  Comment   Forecast RELEX  STATUS
0   90646164    0001    01/04/2021  30/04/2021  15         OPECOM   13,07          Subir
1   90646164    0001    01/04/2021  30/04/2021  10         OPECOM   13,07          Não Subir
2   90646150    0001    01/04/2021  30/04/2021  11         OPECOM   27,24          Não Subir
3   90646143    0001    01/04/2021  30/04/2021  12         OPECOM   5,87           Não Subir

Line 4, index 3, asks for "Do Not Go Up", but the ['Forecast'] is bigger, should return "Go Up". Can anyone help me where I’m going wrong? Note that line 2 index 1, is correct, returning "Do Not Go Up".

The column Forecast Relex contains numbers with comma (13,07), to be valid your if would have to be point (13.07) instead of commas

– Zé Reis M. Olliver

2021/03/22 at 14:47
1

You’re probably comparing two strings, '12' and '5,87', then in this comparison the character '5' is larger than the '1', returning the second value as higher.

– Woss

2021/03/22 at 14:50
If I turn , to . wrong, with the following error: '>' not supported between instances of 'str' and 'float'

– Vivian

2021/03/22 at 14:55

2 answers

1

A brief explanation

A very common error is to use lambda calling a function and in this function reference the whole dataframe, when the ideal would only be to reference the line

Example:

For the call df["nova coluna"] = df.apply(lambda x: minha_func(x), axis=1)

The mistake would be

def minha_func(row):
   if df["coluna"] > df["outra coluna"]:
       return "Maior"
   else:
       return "Menor ou igual"

In a 1000 Rows dataframe, the script would run the lambda 1000 times calling the function. The function in turn would run on the entire dataframe (X1000). That is, the script would have an iteration of 1000 x 1000... Instead of just 1000.

The right thing should be

def minha_func(row):
   if row["coluna"] > row["outra coluna"]:
       return "Maior"
   else:
       return "Menor ou igual"

We now come to the solutions

Case 1, using apply

The dataframe is

>>> df
    Product  Location  Start_Date    End_Date  Forecast Comment Forecast RELEX
0  90646164         1  01/04/2021  30/04/2021        15  OPECOM          13,07
1  90646164         1  01/04/2021  30/04/2021        10  OPECOM          13,07
2  90646150         1  01/04/2021  30/04/2021        11  OPECOM          27,24
3  90646143         1  01/04/2021  30/04/2021        12  OPECOM           5,87

Step 1: Transform column Forecast RELEX float

# Virgula para ponto
df["Forecast RELEX"] = df["Forecast RELEX"].str.replace(",", ".")

# Coluna para float
df["Forecast RELEX"] = pd.to_numeric(df["Forecast RELEX"], errors='coerce')

Step 2 (all in the lambda)

df["Status"] = df.apply(lambda row: "Subir" if row["Forecast"] >= row["Forecast RELEX"] else "Nao Subir", axis=1)

Step 2 (using function)

def get_status(row):
    if row["Forecast"] >= row["Forecast RELEX"]:
        return "Subir"
    else:
        return "Nao Subir"

df["Status"] = df.apply(lambda row: get_status(row), axis=1)

The result will be:

    Product  Location  Start_Date    End_Date  Forecast Comment  Forecast RELEX     Status
0  90646164         1  01/04/2021  30/04/2021        15  OPECOM           13.07      Subir
1  90646164         1  01/04/2021  30/04/2021        10  OPECOM           13.07  Nao Subir
2  90646150         1  01/04/2021  30/04/2021        11  OPECOM           27.24  Nao Subir
3  90646143         1  01/04/2021  30/04/2021        12  OPECOM            5.87      Subir

Case 2

Step 1: Transform column Forecast RELEX float

Do the same as described in Caso 1 above

Step 2: Create a column Status with Nan

import numpy as np

df["Status"] = np.nan

Step 3: Dealing with Climbing

df.loc[df["Forecast"] >= df["Forecast RELEX"], ["Status"]] = "Subir"

Step 4: Deal with "not climbing"

Since Subir has already been inserted...

df["Status"].fillna("Nao subir", inplace=True)

Note Although the Caso 2 has more steps, it is much faster to run than the first. That is, in really large dataframes, opt for the second case.

The result is the same

    Product  Location  Start_Date    End_Date  Forecast Comment  Forecast RELEX     Status
0  90646164         1  01/04/2021  30/04/2021        15  OPECOM           13.07      Subir
1  90646164         1  01/04/2021  30/04/2021        10  OPECOM           13.07  Nao subir
2  90646150         1  01/04/2021  30/04/2021        11  OPECOM           27.24  Nao subir
3  90646143         1  01/04/2021  30/04/2021        12  OPECOM            5.87      Subir

Browser other questions tagged python

You are not signed in. Login or sign up in order to post.

by Zé Reis M. Olliver • **372** points · Answer 1 · 2021-03-22T14:57:49+00:00

Try as follows, replacing the comma by stitch and turning into float!

def get_status(df3):
    if (str(df3['Forecast RELEX']).lower() == 'nan' and str(df3['Forecast']).lower() != 'nan'):
        return 'Subir'
    elif float(df3['Forecast']) > float(df3['Forecast RELEX'].replace(',','.')):
        return 'Subir'
    elif float(df3['Forecast']) == float(df3['Forecast RELEX'].replace(',','.')):
        return 'Não Subir'
    else:
        return 'Não Subir'

# criar coluna nova para "status"
df3['STATUS'] = df3.apply(lambda row: get_status(row), axis = 1)
df3.head(200)