K-means clustering is a method for identifying subgroups i.e. clusters and cluster centers in a set of unlabelled data.
Kmeans algorithm works as follows:
Consider we gave initial unlabelled data of K centers.
- Randomly select K data points.
- For each data point, its cluster is identified (i e all the data points that are close to center).
- Keep iterating until none of the clusters remains unstable by calculating the average of all the points assigned to the cluster and get a new centroid.
from numpy import vstack,array from numpy.random import rand from scipy.cluster.vq import kmeans,vq,whiten # data generation with three features data = vstack((rand(100,3) + array([.5,.5,.5]),rand(100,3))) # whitening of data data1 = whiten(data) # computing K-Means with clusters centroids,_ = kmeans(data1,3) print(centroids)
[[1.19175347 0.98152245 1.89185956] [2.70493124 2.64803832 2.62024853] [1.51076247 1.50606114 0.61546322]]