CHANGING VALUES ON A DATFRAME

Asked

Viewed 49 times

-1

Good afternoon. I have a Dataframe with the following head(): inserir a descrição da imagem aqui

Note that in the bmi column there are Nan values, more precisely, there are 201 lines. I want to change this value based on the age column, so I created these rules:

median1 = base.loc[(base['age'] >= 18) & (base['age'] < 30)].mean().age # media entre 18 a 30 anos

median2 = base.loc[(base['age'] >= 30) & (base['age'] <= 45)].mean().age # media entre 30 a 45 anos

median3 = base.loc[(base['age'] >= 46) & (base['age'] <= 65)].mean().age # media entre 46 a 65 anos

median4 = base.loc[(base['age'] >= 66) & (base['age'] <= 83)].mean().age # media entre 30 a 45 anos

How do I change the bmi column by following these conditions? I tried using Oc, using np.select and so on. Ex of the code:

base.Loc[(base['age'] >= 18) & (base['age'] < 30) & (base['bmi'] ==np.Nan), 'bmi'] = median1.age The documentation says I can pass a condition and the new value, but when this kind of more complex condition happens, it just won’t.

  • That’s what you want? base.loc[(base['age'] >= 18) & (base['age'] < 30), "bmi"] = base.loc[(base['age'] >= 18) & (base['age'] < 30)].mean().age

  • This way, changes all that are among the values. I would like to change only those that have missing values.

1 answer

0


There’s a way to do what you want, but it’s not as direct as you’d like:

  1. Create a dedicated function to implement the bmi calculation according to your criteria.
  2. Create column values separately bmi with the function for all lines.
  3. Use the method Where() to replace the values Nan at their calculated values.

It is an Overkill, you have to calculate this substitute value for all lines instead of only the necessary ones, but it works.

# função que retorna o índice bmi para um valor de idade
def median(df, age):
    if age >= 18 and age < 30:
        return df.loc[(df.age >= 18) & (df.age < 30)].bmi.mean()
    elif age >= 30 and age < 45:
        return df.loc[(df.age >= 30) & (df.age < 45)].bmi.mean()
    elif age >= 45 and age < 65:
        return df.loc[(df.age >= 45) & (df.age < 65)].bmi.mean()
    elif age >= 65:
        return df.loc[df.age >= 65].bmi.mean()

# lista com o valor bmi para todas as entradas na coluna age
new_bmi = [median(base, age) for age in base.age]

# data frame com a coluna bmi substituída pelos valores novos somente onde tem NaN
base.bmi = base.bmi.where(base.bmi.notna(), new_bmi)

Test there with your data frame and let us know if you have solved it. Hug.

  • That’s exactly it, buddy. Thank you very much!! I didn’t know I could do it like this.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.