Message Boards Message Boards

GROUPS:

A step-by-step walkthrough of the k-means algorithm

Posted 1 month ago
1214 Views
|
5 Replies
|
5 Total Likes
|

5 Replies

enter image description here -- you have earned Featured Contributor Badge enter image description here Your exceptional post has been selected for our editorial column Staff Picks http://wolfr.am/StaffPicks and Your Profile is now distinguished by a Featured Contributor Badge and is displayed on the Featured Contributor Board. Thank you!

This k-means implementation -- and many k-neighbor etc. codes suffer from a short coming that yields poor results when the measure represents distance. In particular, the "nearest" does not account for multiplicity; i.e. when there are multiple neighbors with the same nearest distance. The result is that the algorithms under-reach in their collection of neighbors, causing a domino effect downstream including clusters that are mutated with respect to reality. I previously brought this to the group's attention here:

https://community.wolfram.com/groups/-/m/t/2079392

I have since implemented my own algorithms to work around the issue.

Interesting! Yes, there are definitely some major drawbacks to k-means, but it works fairly well given how simple it is

It is not that there are drawbacks to k-means, but rather there are drawbacks to implementations that ignore multiplicity. Part of this is due to the influence of of older k-neighbor algorithms, designed for compiler construction in a single threaded single processor world of the time.

If clustering and associated data structures interest you, see this 1993 paper by Warren and Salmon, "A parallel hashed oct-tree N-body algorithm": https://dl.acm.org/doi/pdf/10.1145/169627.169640 and the 2014 update by Warren: https://content.iospress.com/articles/scientific-programming/spr385

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract