constituents dataset \left\{x^{(1)}, \dots, x^{(n)}\right\} number of clusters k requirements Initialize cluster centroids \mu_{1}, …, \mu_{k} randomly, and repeat: assigning points x^{(i)} to cluster centers c^{(i)}: for each i \in [1, \dots N], write c^{(i)} = \arg\min_{j} \norm{x^{(i)}- \mu_{j}}^{2}_{2} update centroids: for each j \in [1 \dots k], write \mu_{j} = \frac{\sum_{i=1}^{n} \mathbbm{1}\left\{c^{(i)}=j\right\} x^{(i)}}{\sum_{i=1}^{n} \mathbbm{1}\left\{c^{(i)}=j\right\}} additional information some ways to pick better centroid sample the data points as the initial centroids k-means++ k-means++ pick uniform data point to be first centroid pick next centroid w.r.t. probability proportional to distance to the previous centroid squared distortion function Consider the following function:

\begin{equation} J\left(c, \mu\right) = \sum_{i=1}^{n} \norm{x^{(i)}- \mu_{c^{(i)}}}^{2} \end{equation}

Adding more clusters will reduce this function, and at some point we will reach a regime where this function reaches minima. There will be a point where we reach an “elbow” where decreases is no longer as dramatic.

[[curator]]
I'm the Curator. I can help you navigate, organize, and curate this wiki. What would you like to do?