K-means clustering is a method for identifying subgroups i.e. clusters and cluster centers in a set of unlabelled data.
Kmeans algorithm works as follows:

Consider we gave initial unlabelled data of K centers.

  1. Randomly select K data points.
  2. For each data point, its cluster is identified (i e all the data points that are close to center).
  3. Keep iterating until none of the clusters remains unstable by calculating the average of all the points assigned to the cluster and get a new centroid.


from numpy import vstack,array  
from numpy.random import rand  
from scipy.cluster.vq import kmeans,vq,whiten  
# data generation with three features  
data = vstack((rand(100,3) + array([.5,.5,.5]),rand(100,3)))  
# whitening of data  
data1 = whiten(data)  
# computing K-Means with clusters   
centroids,_ = kmeans(data1,3)  


[[1.19175347 0.98152245 1.89185956]
 [2.70493124 2.64803832 2.62024853]
 [1.51076247 1.50606114 0.61546322]]

Index | Previous