Trying to lambda the dataset?

Asked

Viewed 25 times

0

I’m trying to apply this function to the dataset:

 number_outliers = (df2 < (Q1 - 1.5 * IQR)) | (df2 > (Q3 + 1.5 * IQR))

        df2.apply(((lambda x:df2[~((df2 < (Q1 - 1.5 * IQR)) | (df2 > (Q3 + 1.5 * IQR)))])),axis=1, broadcast=True, raw=True, reduce=True ,args=number_outliers )
        return df2

Where the number_outilers is a parameter to identify the outliers and the lambda function is to remove , I don’t know where I’m going wrong , someone would have some suggestion ?

1 answer

1


What you want is to make one subset bank. In this case you do not need to define a function. The correct command to find the number outliers of a variable var would be:

import numpy as np

Q1 = np.percentile(df2['var'], 25, interpolation='midpoint')
Q3 = np.percentile(df2['var'], 75, interpolation='midpoint')
IQR = Q3-Q1
number_outliers = df2[ (df2['var'] < (Q1 - 1.5 * IQR)) | (df2['var'] > (Q3 + 1.5 * IQR))]['var'].size

Example:

import numpy as np
import pandas as pd

df2 = pd.DataFrame({'var': [-400, 1, 2, 3, 400],
                    'B': [5, 6, 7, 8, 9],
            'C': ['a', 'b', 'c', 'd', 'e']})

Q1 = np.percentile(df2['var'], 25, interpolation='midpoint')
Q3 = np.percentile(df2['var'], 75, interpolation='midpoint')
IQR = Q3-Q1
number_outliers = df2[ (df2['var'] < (Q1 - 1.5 * IQR)) | (df2['var'] > (Q3 + 1.5 * IQR))]['var'].size

print(number_outliers)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.