caret
is the acronym for Classification Tond REgression TRaining. By definition, it is a package that provides algorithms for data classification and regression.
Classification is what we call a method capable of separating our observations according to predefined classes. This is called supervised learning. Several grading methods are available on caret
, as LDA, Random Forest, K Nearest and Similar Neighbors. At this link is the complete list of these methods.
Clustering is what we call a method capable of separating our observations without the need to use predefined classes. It’s called unsupervised learning.
K-Means is a Clusterization method. Therefore, it is not available for the caret
. Probably never will be.
So the answer to the question
Is there a function equivalent to kmeans so I can validate the model?
is nay, there is no equivalent function to kmeans
in the caret
. It is a package that makes classification, not Clusterization.
However, it is possible to use K Means as a classifier. As far as I know, there is no option for this already ready on R
, but nothing stops you from programming yours. I wouldn’t recommend it, because K Means has serious problems, like
Does not work well on data with many dimensions
It doesn’t work if the groups have very different sizes
Because it uses Euclidean distance to decide the belonging of observations to groups, it will not work well for data with large asymmetries or many outliers
On the other hand, it seems to me that your problem is something related to classification, because you have access to the classes of each observation. Therefore, any method for classifying caret
would serve to train and validate your modeling. If I understand correctly and your problem is classification and not Clusterization, I suggest you give up K Means and go for something more sophisticated.