What is Overfitting and Underfitting in Machine Learning

Asked

Viewed 9,291 times

14

What is Overfitting and Underfitting in Machine Learning? I am studying a little on the subject and I was curious where this applies.

2 answers

13


Overfitting(over-adjusting or over-adjusting) and Underfitting(sub-adjustment) in Machine Learning are classifications or concepts in model fit. But so what? What is this?

Speaking of Machine Learning, there are basically three types of machine learning, which is the Learning by Reinforcement, Supervised and Unsupervised. Supervised learning is based on a set of techniques to adjust function parameters so that these functions meet some conditions that are given by the values of the tags. When we have these parameters adjusted and we already know which function we are trying to calculate, we say we have a model.

For this, machine learning algorithms perform model adjustment (from English, model fit), which occurs while it is being trained based on the data so that it becomes possible to make predictions with the model(from English, model Predict) as trained using the labels. With this, one can define overfitting and underfitting.

This understanding will guide you to take corrective action. It can be determined whether a predictive model is underadjusting or over-adjusting training data by referring to the forecast error in training data and evaluation data. Understanding model fit is important to understand the root cause of unsatisfactory model accuracy.

The model overfitting occurs when the model has adapted very well to the data with which it is being trained; however, it does not generalize well to new data. That is, the model "memorized" the training data set, but did not really learn what differentiates those data for when you need to face new tests.

The model underfitting occurs when the model does not adapt well even to the data with which it has been trained.

See in the graph how they are represented:

inserir a descrição da imagem aqui

References

  1. Model adjustment: sub-fit versus over-fit
  2. Refinement of Machine Learning Algorithms

2

Overfitting (over-adjustment) is a term used in statistics to describe when a statistical model fits very well to the previously observed data set, but proves ineffective in predicting new results.

It is common for the sample to present deviations caused by measurement errors or random factors. It occurs when the model fits these. An over-adjusted model presents high precision when tested with its data set, but such a model is not a good representation of reality and should therefore be avoided. It is quite common that these models have considerable variance and that their graphics have several small oscillations, so it is expected that representative models are convex.

A tool to circumvent the overfit problem is regularization, which adds the value of the parameters to the cost function. Such addition results in the elimination of parameters of little importance and thus in a more convex model, of which it is expected to be more representative of reality. Through cross-validation, in which we test our model in relation to a reserved part of the data set that was not used in the training of the model in question, it is possible to have an idea of whether the model suffers from over-adjustment or not.

Underfitting, the counterpart to overfitting, happens when a machine learning model is not complex enough to accurately capture the relationships between the resources of a dataset and a target variable. An insufficient model results in problematic or erroneous results in new data, or data in which it has not been trained, and often performs poorly, even in training data.

Sources: https://www.datarobot.com/wiki/underfitting/ https://en.wikipedia.org/wiki/Overfitting

Browser other questions tagged

You are not signed in. Login or sign up in order to post.