Well, to understand this question, it is illustrative to start by presenting the conceptual difference between models of Machine Learning supervised and unsupervised. Starting with the latter, unsupervised models are models that seek to make an estimation in a context that the response variable is not known. The classic case is implementations of core component models. In these models, it is possible to create components based on correlations of variables in the database, although in most cases the practical representation or concrete meaning of these components is unknown.
On the other hand, in supervised models, the dependent variable (or output, variable explained, response variable) is known. An example would be a model that tries to predict the participation of women in the labour market based on variables such as age, education, number of children, among others. The dependent variable, in this case, is a dummy which assumes value 1 if the woman is in the labor market and 0 if she is not. When you adjust the model to make this prediction it is important to know the predictive ability of your model, in addition to the data it was trained on. For this reason, it is common to separate your database into training and test. The data in the training base is used when training the model, while the data in the test base test the performance of the model outside the sample (with new data).
It is important to note that there will always be a difference of performance of the model in the different bases (training and testing). In fact, this difference is always in favor of the training base (think, the model already "knows" those data). From this difference it is possible to formulate another relevant distinction in Machine Learning, which is the difference between underfitting and overfitting. As shown in the figure below, taken from Andreas Müller’s book, we say that the model is underfitting when his performance is bad in both the base training and the base testing. When we increase the complexity of the model, its performance improves on both bases. However, a very complex model gets very "adjusted to the training base", that is, it hits the training base a lot, but has little generalization power. That’s what we call overfitting. Note that, from the perspective of the data scientist, the challenge is to maximize accuracy without losing the ability to generalize.
In short, separating the base from the training base and the test base is key to knowing:
1) the accuracy of the model and
2) How much we can improve it without losing generalization capacity
That’s how I understand it, it would be nice to see other visions