Dataframe - Pandas. Assigning values in columns from comparing another column

Asked

Viewed 6,875 times

1

I have the following Dataframe:

import pandas as pd
df = pd.DataFrame({'id_emp': [1,2,3,4,1], 
               'name_emp': ['x','y','z','w','x'], 
               'donnated_value':[1100,11000,500,300,1000],
               'refound_value':[22000,22000,50000,450,90]
            })
df['return_percentagem'] = 100 * df['refound_value']/df['donnated_value']
df['classification_roi'] = ''

I want to assign values to df['classification_roi'] from the values of df['return_percentage']. Example: df values["return_percentage'] > 100, df['classification_roi'] = 'Good investment';df values["return_percentage'] between 99 and 50, df['classification_roi'] = 'Medium investment';df values["return_percentage'] <50 , df['classification_roi'] = 'Bad investment'.

I’m trying the following, but all lines receive as value 'Bad Investment', IE, all are entering the first loop

def comunidade():
for i in df['return_percentagem'].values:
    if i < 50:        
        df['classification_roi'] = 'Bad Investment'
    elif i >=50 and i < 100:
        df['classification_roi'] = 'Median Investment'
    elif i >= 100:
        df['classification_roi'] = 'Good Investment'
comunidade()

I appreciate any help

2 answers

1

You can solve this using the function select of numpy, passing an array of conditions, a result array and a value to default

condicao = [df['return_percentagem'] < 50,
            df['return_percentagem'] < 100]

resultados = ['Bad Investment', 'Median Investment']

df['classification_roi'] = np.select(condicao, resultados, 'Good Investment')

output:

    id_emp  name_emp    donnated_value  refound_value   return_percentagem  classification_roi
0   1       x           1100            22000           2000.0              Good Investment
1   2       y           11000           22000           200.0               Good Investment
2   3       z           500             50000           10000.0             Good Investment
3   4       w           300             450             150.0               Good Investment
4   1       x           1000            90              9.0                 Bad Investment

1


On pandas, when we do: df['coluna'] = 'valor', all fields of 'coluna' are filled with 'valor'. So much so that in doing df['classification_roi'] = '', all rows in the column classification_roi receive the value ''.

In your code, within the function comunidade(), you’re making df['classification_roi'] = string. Last i of for i in df['return_percentagem'].values:, the i receives 9.0, then the condition if i < 50: is true and the command df['classification_roi'] = 'Bad Investment' is executed, making all rows of the column classification_roi take the value 'Bad Investment'. But see that along the for, the column classification_roi is changed several times. Hence its conclusion "ie, everyone is entering the first loop" is wrong.

To correctly change the value of each row in the column classification_roi, a solution is to follow this answer:

for index, row in df.iterrows():
    if row['return_percentagem'] < 50:
        df.loc[index,'classification_roi'] = 'Bad Investment'
    elif row['return_percentagem'] < 100:
        df.loc[index,'classification_roi'] = 'Median Investment'
    elif row['return_percentagem'] >= 100:
        df.loc[index,'classification_roi'] = 'Good Investment'

Or this answer:

def comunidade(num):
    if num < 50:        
        return 'Bad Investment'
    elif num < 100:
        return 'Median Investment'
    elif num >= 100:
        return 'Good Investment'

df['classification_roi'] = df['return_percentagem'].map(comunidade)
  • Thank you, solved the problem.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.