Outlier Detection with python

Question

Outlier Detection with python

Asked 4 years, 6 months ago

Viewed 342 times

1

Hello. I am currently learning Data Science, currently I am at the beginning of Machine Learning, and during my studies I realized that the data has to be organized and within "line, "very extravagant data can cause problems in the model".

So, I am in a project to try to predict the next value of the closing and opening of the PETR4.SA action (Petrobras action), but I could not plot the graph for the outlier calculation. How do I do that?

And these are the libraries I’m using:

#Analise exploratoria de dados

import pandas as pd
import numpy as np

#Visualização
import matplotlib.pyplot as plt
import seaborn as sns

#Drive
from google.colab import files

1 answer

Browser other questions tagged python pandas matplotlib

You are not signed in. Login or sign up in order to post.

by jsbueno • **30,668** points · Answer 1 · 2020-12-29T20:01:37+00:00

You can use filter strategies to avoid spikes and points outside the curve, in various data domains (that is - data from various areas of knowledge: in health a value outside could identify an exam measure copied wrong, or taken in a time of stress, for example). But s is using for asset quotes - I believe it makes sense to use the values as they are - taking the "outliers" would end any usefulness of your model - If the closing value on day 2/12 was 3% higher, it was 3% higher - it is not a wrong setting in a photo, an electrical noise in an analog instrument, etc... you have to take this variation into account.

Moreover, it may be that precisely because of this characteristic of market data - (they are already imputed from digital systems), you simply do not have none point that would be an "outlier" in the data you are working on - it may even be that you have done everything right.

Since you haven’t put a way for us to help you in a more concrete way: neither the code you’re using to plot the data, nor a way for the respondent to have the dataframe to create some sample plots, it’s not possible to help you beyond this point.

It’s easy to find articles on the subject, but I suppose you’ve been through some. This one seems fairly easy to follow and complete: https://towardsdatascience.com/ways-to-detect-and-remove-the-outliers-404d16608dba