Is it possible to use K-Means (or another Clusterization method) with point limits?

Asked

Viewed 38 times

0

I am developing a cluster code with k-Means and I have the following question: It is possible to determine the point limits per cluster with k-Means or another algorithm?

Explaining the case better, in the code below, I have two predetermined centroids and 12 points. After running k-Means, we have 8 points on centroid 0 and 4 points on centroid 1.

from sklearn.cluster import KMeans
import numpy as np

#Centroids:
refs = [[-22.87042313, -43.33995681], [-22.91265768, -43.23596109]]
kmeans_model = KMeans(n_clusters=len(refs), random_state=0).fit(refs)
ref_labels = kmeans_model.labels_
centroids = kmeans_model.cluster_centers_

#Points:
points = [[-22.8595871, -43.2385504], [-23.0144844, -43.4727984], [-22.8727929, -43.4082954],
          [-22.9478637, -43.3652225], [-22.8213579, -43.1740529], [-22.9592171, -43.3508173],
          [-22.8236928, -43.3203929], [-22.9027656, -43.3541462], [-22.8749724, -43.5034297],
          [-22.8456399, -43.2840653], [-22.8893855, -43.2424886], [-22.8499984, -43.2564374]]

#Clustering:
kmeans_model.predict(points)
Output: array([1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1], dtype=int32)

I can determine how many points will be in each cluster and have a sort of 'spare''?

For example:

centroid 0 = 4 points

centroid 1 = 3 points

run the k-Means...

saída: [1,0,0,0,1,0,NA,NA,NA,NA,1,NA] The NA values would be the "surplus", values that are not close enough to achieve a "vacancy" in the cluster.

1 answer

0

Maybe you should change your approach. If the goal is to have "values that are not close enough to achieve a 'vacancy' in the cluster", a density Clusterization approach seems more appropriate.

I suggest trying the DBSCAN (or even the OPTICS), which is implemented in sklearn, so just import and use the algorithm:

clustering = DBSCAN(eps=3, min_samples=2).fit(X)

Try to optimize the Epsilon parameter (eps), which represents the maximum distance between two points for one to be considered in the vicinity of the other.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.