Message Boards

WOLFRAM COMMUNITY

8950 Views

5 Replies

5 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Staff Picks Data Science Mathematics Algebra Graphics and Visualization Wolfram Language Machine Learning Artificial Intelligence

A step-by-step walkthrough of the k-means algorithm

Laney Moy

Laney Moy, University of Illinois at Urbana-Champaign

Posted 3 years ago

POSTED BY: Laney Moy

5 Replies

Sort By:

Richard Frost

Richard Frost, Frost Concepts

Posted 3 years ago

POSTED BY: Richard Frost

Richard Frost

Richard Frost, Frost Concepts

Posted 3 years ago

This k-means implementation -- and many k-neighbor etc. codes suffer from a short coming that yields poor results when the measure represents distance. In particular, the "nearest" does not account for multiplicity; i.e. when there are multiple neighbors with the same nearest distance. The result is that the algorithms under-reach in their collection of neighbors, causing a domino effect downstream including clusters that are mutated with respect to reality. I previously brought this to the group's attention here: https://community.wolfram.com/groups/-/m/t/2079392 I have since implemented my own algorithms to work around the issue.

POSTED BY: Richard Frost

Laney Moy

Laney Moy, University of Illinois at Urbana-Champaign

Posted 3 years ago

Interesting! Yes, there are definitely some major drawbacks to k-means, but it works fairly well given how simple it is

POSTED BY: Laney Moy

Richard Frost

Richard Frost, Frost Concepts

Posted 3 years ago

It is not that there are drawbacks to k-means, but rather there are drawbacks to implementations that ignore multiplicity. Part of this is due to the influence of of older k-neighbor algorithms, designed for compiler construction in a single threaded single processor world of the time. If clustering and associated data structures interest you, see this 1993 paper by Warren and Salmon, "A parallel hashed oct-tree N-body algorithm": https://dl.acm.org/doi/pdf/10.1145/169627.169640 and the 2014 update by Warren: https://content.iospress.com/articles/scientific-programming/spr385

POSTED BY: Richard Frost

Moderation Team

Moderation Team, WOLFRAM

Posted 3 years ago

-- you have earned *Featured Contributor Badge* Your exceptional post has been selected for our editorial column *Staff Picks* http://wolfr.am/StaffPicks and Your Profile is now distinguished by a *Featured Contributor Badge* and is displayed on the Featured Contributor Board. Thank you!

POSTED BY: Moderation Team

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Group Abstract

Feedback