0
I am developing a cluster code with k-Means and I have the following question: It is possible to determine the point limits per cluster with k-Means or another algorithm?
Explaining the case better, in the code below, I have two predetermined centroids and 12 points. After running k-Means, we have 8 points on centroid 0 and 4 points on centroid 1.
from sklearn.cluster import KMeans
import numpy as np
#Centroids:
refs = [[-22.87042313, -43.33995681], [-22.91265768, -43.23596109]]
kmeans_model = KMeans(n_clusters=len(refs), random_state=0).fit(refs)
ref_labels = kmeans_model.labels_
centroids = kmeans_model.cluster_centers_
#Points:
points = [[-22.8595871, -43.2385504], [-23.0144844, -43.4727984], [-22.8727929, -43.4082954],
[-22.9478637, -43.3652225], [-22.8213579, -43.1740529], [-22.9592171, -43.3508173],
[-22.8236928, -43.3203929], [-22.9027656, -43.3541462], [-22.8749724, -43.5034297],
[-22.8456399, -43.2840653], [-22.8893855, -43.2424886], [-22.8499984, -43.2564374]]
#Clustering:
kmeans_model.predict(points)
Output: array([1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1], dtype=int32)
I can determine how many points will be in each cluster and have a sort of 'spare''?
For example:
centroid 0 = 4 points
centroid 1 = 3 points
run the k-Means...
saída: [1,0,0,0,1,0,NA,NA,NA,NA,1,NA] The NA values would be the "surplus", values that are not close enough to achieve a "vacancy" in the cluster.