Group Abstract Group Abstract

Message Boards Message Boards

0
|
3.1K Views
|
3 Replies
|
0 Total Likes
View groups...
Share
Share this post:

Distance function used in FindClusters[ ]?

Posted 3 years ago

I am trying to find which distance function has been used in FindClusters function with GaussianMixture method. My data is a set of 2D points (originally from super-resolution microscopy).

data={{2408.7, 3004.58}, {2405.25, 2968.06}, {2335.47, 2936.84}, {2277.45, 
  3053.76}, {2379.39, 3056.22}, {2399.1, 2978.29}, {2433.3, 
  2937.93}, {2417.11, 2979.81}, {2392.54, 2972.33}, {2412.88, 
  2980.09}, {2406.89, 2980.1}, {2429.44, 3003.47}, {2437.38, 
  2996.5}, {2434.53, 3004.63}, {2409.17, 2981.06}, {2405.16, 
  2976.72}, {2408.82, 2975.55}, {2411.23, 2983.65}, {2476.62, 
  3020.51}, {2465.2, 3030.33}, {2452.87, 3028.5}, {2340.89, 
  2935.55}, {2340.62, 2934.}, {2323.16, 2940.07}, {2348.07, 
  2921.78}, {2339.58, 2929.19}, {2378.19, 2952.4}, {2470.79, 
  3030.07}, {2474.7, 3025.94}, {2475.97, 3032.73}, {2490.62, 
  3020.6}, {2472.78, 3024.96}, {2469.55, 3034.2}, {2343.58, 
  2925.55}, {2345.73, 2931.64}, {2340.19, 2920.53}, {2514.02, 
  3021.1}, {2481.69, 3037.17}, {2460.23, 3033.85}, {2351.52, 
  2942.98}, {2337.6, 2934.17}, {2344.68, 2928.11}, {2342.17, 
  2932.35}, {2342.68, 2933.69}, {2477.56, 3026.93}, {2491.97, 
  3026.5}, {2470.31, 3023.62}, {2461.22, 3028.97}, {2468.34, 
  3025.81}, {2472.37, 3022.72}, {2469.93, 3026.72}, {2469.99, 
  3021.24}, {2468.91, 3021.}, {2471.69, 3030.27}, {2463.6, 
  3019.94}, {2344.79, 2932.94}, {2346.51, 2930.13}, {2346.14, 
  2936.01}, {2467.94, 3049.63}, {2344.35, 2916.66}, {2411.71, 
  2985.91}, {2396.08, 2982.96}, {2428.52, 3026.03}, {2406.89, 
  2976.34}, {2397.75, 2985.19}, {2393.9, 2980.12}, {2340.44, 
  2937.99}, {2351.68, 2941.9}, {2327.49, 2934.74}, {2360.16, 
  2917.93}, {2344.89, 2930.75}, {2351.96, 2926.25}, {2347.01, 
  2930.31}, {2350.64, 2937.55}, {2341.22, 2942.17}, {2338.14, 
  2940.31}, {2336.11, 2936.02}, {2340.12, 2944.29}, {2334.35, 
  2929.25}, {2335.14, 2923.65}, {2337.96, 2935.88}, {2338.89, 
  2949.2}, {2337.3, 2939.05}, {2335.6, 2939.09}, {2337.25, 
  2938.13}, {2336.05, 2941.}, {2327.59, 2937.21}, {2343.22, 
  2942.88}, {2348.27, 2961.23}, {2345.35, 2937.15}, {2335.93, 
  2940.03}, {2340.71, 2937.18}, {2482.96, 3032.34}, {2478.36, 
  3031.56}, {2344.87, 2921.34}, {2341.24, 2924.27}, {2347.17, 
  2928.7}, {2367.25, 2904.04}, {2471.6, 3022.64}, {2477.78, 
  3023.67}, {2458.95, 3014.98}, {2465.44, 3023.26}, {2477.03, 
  3018.79}, {2467.7, 3019.15}, {2474.54, 3028.71}, {2470.01, 
  3028.37}, {2474.09, 3024.57}, {2457.4, 3032.7}, {2468.33, 
  3030.79}, {2468.15, 3027.11}, {2467.95, 3029.81}, {2476.69, 
  3022.55}, {2464.65, 3028.98}, {2466.26, 3034.22}, {2472.04, 
  3022.09}, {2461.38, 3019.56}, {2475.01, 3034.62}, {2466.42, 
  3033.9}, {2462.48, 3023.26}, {2339.52, 2923.98}, {2339.8, 
  2934.56}, {2355.37, 2923.11}, {2404.71, 2977.65}, {2419.35, 
  2971.96}, {2410.6, 2977.57}, {2404.25, 2978.9}, {2404.15, 
  2967.16}, {2412.84, 2985.38}, {2403.64, 2972.13}, {2410.35, 
  2977.03}, {2335.51, 2917.97}, {2347.46, 2925.05}, {2355.34, 
  2936.37}, {2348.38, 2933.24}, {2353.45, 2933.66}, {2353.56, 
  2940.09}, {2352.79, 2928.36}, {2356.69, 2931.11}, {2412.14, 
  2976.92}, {2409.66, 2978.98}, {2421.35, 2973.84}, {2420.15, 
  2977.75}, {2414.78, 2977.53}, {2407.68, 2973.4}, {2466.72, 
  3008.01}, {2389.29, 2986.55}, {2403.6, 2979.37}, {2400.74, 
  2978.25}, {2421.98, 2965.}, {2464.31, 3014.27}, {2467.22, 
  3011.93}, {2484.88, 3018.02}, {2357.77, 2911.47}, {2429.27, 
  2967.32}, {2415.24, 2980.61}, {2425.51, 2985.45}, {2414.86, 
  2986.86}, {2389.61, 2998.52}, {2338.79, 2925.81}, {2335.75, 
  2922.73}, {2346.84, 2918.28}, {2339.3, 2920.25}, {2477.21, 
  3011.47}, {2477.6, 3018.44}, {2475.41, 3019.79}, {2473.91, 
  3025.57}, {2473.42, 3025.7}, {2482.93, 3017.6}, {2332.75, 
  2941.51}, {2331.39, 2927.89}, {2313.53, 2931.29}, {2359.33, 
  2900.22}, {2472.95, 3039.}};

If I plot results of FindClusters method for all distance functions I found in help, I could not find the one used by Automatic choice. Even though, the help says the default is SquaredEuclideanDistance.

(ListPlot[
    FindClusters[data, 
     Method -> "GaussianMixture", 
     DistanceFunction -> #], PlotStyle -> PointSize[0.02], 
    PlotLabel -> #]) & /@ {Automatic, EuclideanDistance, 
  SquaredEuclideanDistance, NormalizedSquaredEuclideanDistance, 
  ManhattanDistance, ChessboardDistance, BrayCurtisDistance, 
  CanberraDistance, CosineDistance, CorrelationDistance}

enter image description here

Any advice would greatly help. Thank you.

POSTED BY: Miroslav Hekrdla
3 Replies

Miroslav,

some time ago I faced the same effect. The problem seems to arise whenever there is a big off center shift in the data. If this shift is compensated, e.g.:

center = Mean[data];
data1 = # - center & /@ data;

and then data1 is used, the clustering seems to work.

POSTED BY: Henrik Schachner

Thank you very much Henrik,

you are answering my question which I was humble to ask without my thorough effort: why there are two clusters instead of apparent three. And yes, your proposed data centering improves the results dramatically! On the other hand, I am still curious in the original question: how to find which parameters have been automatically selected.

Miroslav

POSTED BY: Miroslav Hekrdla

On the other hand, I am still curious in the original question: how to find which parameters have been automatically selected.

To be honest: I have no idea! - Anyone?

POSTED BY: Henrik Schachner
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard