Message Boards Message Boards


A step-by-step walkthrough of the k-means algorithm

Posted 6 months ago
5 Replies
5 Total Likes

5 Replies

enter image description here -- you have earned Featured Contributor Badge enter image description here Your exceptional post has been selected for our editorial column Staff Picks and Your Profile is now distinguished by a Featured Contributor Badge and is displayed on the Featured Contributor Board. Thank you!

This k-means implementation -- and many k-neighbor etc. codes suffer from a short coming that yields poor results when the measure represents distance. In particular, the "nearest" does not account for multiplicity; i.e. when there are multiple neighbors with the same nearest distance. The result is that the algorithms under-reach in their collection of neighbors, causing a domino effect downstream including clusters that are mutated with respect to reality. I previously brought this to the group's attention here:

I have since implemented my own algorithms to work around the issue.

Interesting! Yes, there are definitely some major drawbacks to k-means, but it works fairly well given how simple it is

It is not that there are drawbacks to k-means, but rather there are drawbacks to implementations that ignore multiplicity. Part of this is due to the influence of of older k-neighbor algorithms, designed for compiler construction in a single threaded single processor world of the time.

If clustering and associated data structures interest you, see this 1993 paper by Warren and Salmon, "A parallel hashed oct-tree N-body algorithm": and the 2014 update by Warren:

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract