My first thought is that the kmeans is not a classification method but a Clusterization method. The difference is subtle, but rather important.
The kmeans is an unsupervised method. There is nothing in this algorithm that is forcing the created groups to resemble the groups of plant species (in this example).
Because it is an unsupervised method, it is difficult to say also which is the best possible cluster... It becomes a somewhat subjective problem. What can be used is:
- the sum of within-group variances: if within each group it is too large, it means that your cluster is not very good
- has also the Rand Index which is implemented in this package
fpc that Robert spoke in the comments
Finally, answering your questions:
It is classifying in a very wrong way, just you notice that the class individuals setosa are divided into two clusters: 1 and 2. And cluster 3 contains both class individuals versicolor how much virginica. That is, the cluster is not helping to separate plant classes.
I don’t know, but at first, you could say that each label of each cluster, is the one of the class that appears most in that cluster...
I can’t answer.
For me, in your case it would make more sense to use a supervised learning algorithm like random forest, regressão logística, knn and etc.
To illustrate the problem of using kmeans consider the following database:
dados <- data_frame(
x = runif(10000), y = runif(10000),
grupo = ifelse(x > 0.25 & x < 0.75 & y > 0.25 & y < 0.75, "azul", "vermelho")
)
Note that the group is deterministically created from x and y. There is no randomness.

Now run a cluster kmeans on that basis and we’ll see if the groups look alike.
cluster <- kmeans(dados[,1:2], 2)
table(cluster$cluster, dados$grupo)
azul vermelho
1 1263 3670
2 1273 3794
They didn’t, because at no point did I ask for the kmeans separate the two groups. It separated only according to the values of x and y who were close...
See in the graph how the groups looked:

Now let’s adjust a Forest Random on that data:
dados$grupo <- as.factor(dados$grupo)
rf <- randomForest::randomForest(grupo ~ x + y, data = dados)
table(predict(rf, dados), dados$grupo)
azul vermelho
azul 2536 0
vermelho 0 7464
Now yes! We were able to fix everything that was blue and what was red. This happens because we are supervising the random forest, that is we are offering ratings for the algorithm to learn.
Look at the help of the function
clusterbootpackagefpc.– Robert
On question 3, some supervised classifier will be even better than k-averages for this problem, as @Danielfalbel already explained in his answer. I’m not very knowledgeable about R, but if you can use Python check out Scikitlearn: http://scikit-learn.org/stable/auto_examples/svm/plot_iris.html
– Luiz Vieira
For SVM functions you have the package
kernlab– Robert
Yes, I’m aware of that. And I know the package
caretalso. But I am not looking for a classifier of this type. I am interested in kmeans and its limitations.– Marcus Nunes