Message Boards Message Boards

Performance measures for FindClusters?

I have tried different combinations of parameters (Method, CriterionFunction ,and DistanceFunction) in FindClusters.
But I still don't know what the best combination.
What is the performance index of FindClusters? How to get it?

POSTED BY: Tsai Ming-Chou

At the moment the clustering metrics are all internal and used to optimize hyper-parameters. We have a plan to expose them and if there is some interest all the better.

For the time being, and keeping in mind that is code might change in the future, you can directly use the internal function

data = RandomReal[1, {1000, 2}];
clusters = FindClusters[data];

ClusterValidation = MachineLearning`PackageScope`ClusterValidation;

Some criteria that measure the "goodness" of a cluster are reversed to for every measure the lower the better

Table[
 Last@ClusterValidation[type, "" -> {"", clusters}],
 {type,
  {"StandardDeviation", "RSquared", "Dunn", "CalinskiHarabasz", "Silhouette"}}
 ]

(* {0.337803, 373.229, -0.00908816, -684.364, -0.380691} *)
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract