Create a column filled with 0 or 1 based on other columns in Pandas

Asked

Viewed 42 times

3

I have a data frame and need to fill a column with 0 or 1 based on the values of another 29 columns. If you have 1 in any of the columns 0 to 28 the column with Dor_leve receives 1, otherwise receives 0, for the column Dor_media is if you have 2 in any of the columns 0 to 28 and to Dor_grave if it has 3 in any column from 0 to 28.

I have the columns Col_0 to Col_28 and I need to generate the columns Dor_leve, Dor_media and Dor_grave as below. The values contained in the columns Col_0 ... Col_28 are 0 or 1 or 2 or 3.

Example of what I need to do:

Col_0 Col_1 Col_2 ... Col_28 Dor_leve Dor_media Dor_grave
0      1     1    ...   0         1         0        0
1      0     2    ...   0         1         1        0  
1      0     3    ...   1         1         0        1
2      3     0    ...   0         0         1        1 
0      0     0    ...   0         0         0        0
3      0     0    ...   1         1         0        1

The columns are different parts of the body, so the person can have 1, 2 or 3 in each column or 0 in all. Attempt was:

import pandas as pd
df = pd.read_excel('arquivo.xlsx', sheet_name=0, header=0)

# Crio as colunas com 0 para todos

df['Dor_leve']= 0
df['Dor_media']= 0
df['Dor_grave']= 0

# Atualizar os valores, com 1 para sim, se tem o problema

for index, value in df.items():
    if(value==1):
        df.loc[index,'Dor_leve']=1
    elif(value==2):
        df.loc[index, 'Dor_media']=1
    elif(value==3):
        df.loc[index, 'Dor_grave']=1
    insira o código aqui

Returned this error:

> ValueError: The truth value of a Series is ambiguous. Use a.empty,
> a.bool(), a.item(), a.any() or a.all().

I tried to:

if(df.items()==1):
   df['Dor_leve']=1

if(df.items()==2):
   df['Dor_media']=1
    
if(df.items()==3):
  df['Dor_grave']=1

It didn’t work either. With the if (df.items()==valor) the columns were only 0.

  • 1

    In the last row of the table 3 0 0 1 1 0 0 the column Dor_grave should not have value 1.

  • Yes, I was wrong to type. Thank you for the remark.

1 answer

4


First establish the columns you want to use to check the condition:

# definindo colunas manualmente
cols = ['Col_0', 'Col_1', 'Col_2', ...]  # preencha com as colunas aqui

# ou então usando uma compreensão de lista, dada uma lógica no nome das colunas
cols = [f'Col_{x}' for x in range(29)]

Then, "ask" for each row of the columns of interest if there is the presence of 1, 2, or 3. The result of this question (True/False) is converted to the integer 1 or 0. We can ask this question using df.apply and applying a function to each line (with the argument axis=1):

df['Dor_leve'] = df[cols].apply(lambda row: int(1 in row.values), axis=1)
df['Dor_media'] = df[cols].apply(lambda row: int(2 in row.values), axis=1)
df['Dor_grave'] = df[cols].apply(lambda row: int(3 in row.values), axis=1)
  • I’d say you can skip the step where you declare/use the cols: could be alone: df.apply(lambda row: int(1 in row.values), axis=1). And if you want to remove the "dor_..." columns, you can just: df.apply(lambda row: int(1 in row.values[:-3]), axis=1)

  • I completed the comment thinking about exactly that:

  • Thank you so much for all your help. I cannot use df.apply that actually the database has other columns with values of 0 or 1 that are used for other classifications. Oh not being that I did the database Slice and then apply df.apply to the database after the slice. Thanks for everyone’s attention and help.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.