How to create a condition for each dataframe subset in pandas?

Asked

Viewed 308 times

0

I wrote the following in order to create a condition for a subset of a dataframe in which case the cell value is higher than the average of that same subset(in this case the initial subset is small countries and what I am looking for is the value of a variable(column in the code written below) of these small countries is higher than the average of that variable in those same small countries). The initial Dataframe has countries of all dimensions but when I run the program nothing happens.

def doule(df, column):
    values = df[column]
    mean = values.mean()
    def higher3(values):
       if values > mean:
          return 1
       if values < mean:
          return 0
       if pd.isna(values) == True:
          return None
    
     triple_bam[(triple_bam.Population_size == 'tiny')][column+'_binary_size'] = values.apply(higher3)
    
doule(triple_bam[(triple_bam.Population_size == 'tiny')],'hc')

Appears c:\...\ SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

1 answer

1


See the example below, I hope it helps you solve

>>> import pandas as pd

>>> df = pd.DataFrame({"A": [1,1,2,2], "B": [2,2,3,3]})

>>> df[(df.A == 2)]
   A  B
2  2  3
3  2  3

>>> df[(df.A == 2)]["B"] = 4
<stdin>:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

Mesmo erro! This is because you are trying to associate a value to a subset of the dataframe.

The solution is to use the method .loc as suggested in the error message. See below:

>>> df.loc[df.A == 2, "B"] = 4

>>> df
   A  B
0  1  2
1  1  2
2  2  4
3  2  4

That is to say: df.loc[CONDIÇÃO, COLUNA] = VALOR

I hope it helps.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.