Removal of automated outilers

Asked

Viewed 37 times

1

I need to remove outliers from a database in a "manual" way, I would use the following command:

a=X
Q1<-quantile(X,0.25)
Q3<-quantile(X,0.75)
IQR<-Q3-Q1
lim_inf=Q1-1.5*IQR
lim_sup=Q3+1.5*IQR
a>lim_sup
a<lim_inf
out=(a>lim_sup)|(a<lim_inf)
a[out]=NA
X=a

here the generic command I use, X is the variable to remove

But this command I always need to change the variable X all the time. I wonder if there is a way to do the removal faster, some loop or some package that removes from all the columns I have in a data.frame and swap for NA

  • I just found the source of the function I had posted as a response thanks to @Ruibarradas user. I deleted the answer, I don’t think it makes sense to have another answer like that, even quoting the source, in the OS just because it’s in another language. Follow the link for very similar question in OS En. I will post another answer with another way I did.

1 answer

0

Thanks to the user @Ruibarradas who found the source of the function I used 2 years ago, you have in the link in the comment I made several ways to do what you want (in SO En). But at the end of the day looking at the script I did two years ago I removed the outliers using same base functions, doing so:

x_com<- rnorm(100)
boxplot(x)

x_sem<- x[!x %in% boxplot.stats(x)$out]
boxplot(x)

I hope I have contributed one more answer to do the same rsrs thing. If you need speed, benchmark this one and the others in the OS En question and choose the best one that fits.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.