My list contains numbers from 0-40k. The figure shows data distribution:
I tried FindClusters[list]
The output is two clusters as seen here:
{{4169, 7114, 5025, 7316, 4977, 10411, 9352, 16438, 8719, 14330,
10277, 7144, 11950, 18572, 10471, 4915, 4958, 7556, 5145, 13862,
8466, 14138, 10861, 11815, 5638, 15242, 16666, 23564, 4256, 13014,
9865, 3729, 5980, 7740, 14290, 14067, 12038, 14125, 6436, 14240,
19054, 9622, 13876, 8362, 5983, 7163, 4908, 12856, 15923, 14368,
14467, 9393, 9555, 8537, 9149, 10272, 8228, 6525, 6596, 10401, 6244,
16576, 15262, 12593, 16128, 13189, 13508, 14206, 15115, 24985,
19442, 18195, 14522, 9103, 8781, 9394, 4716, 6760, 9281, 6958,
10581, 10862, 11518, 11508, 5691, 8567, 9797, 10897, 9535, 8723,
7645, 7035, 7186, 7392, 6913, 7549, 18990, 12778, 15982, 5145,
14650, 14468, 13480, 20918, 14713, 17319, 22983, 20166, 9464, 23675,
8466, 9598, 9698, 7082, 18233, 15193, 11804, 10285, 25290, 17428,
11320, 6441, 11868, 14666, 18505, 11778, 12131, 9275, 6347, 13024,
19351, 14984, 14150, 18093, 7455, 20572, 14041, 23137, 12763, 14986,
11280, 13584, 17583, 14394, 17540, 18123, 16960, 9344, 20265,
21251, 19206, 25316, 17411, 17123, 17137, 11778, 19055, 15926,
18753, 19731, 14524, 21106, 12309, 12357, 17689, 23076, 20067,
10224, 16353, 7571, 8493, 8927, 15024, 18869, 14585, 16099, 18462,
14361, 15621, 15584, 20522, 18542, 13220, 19124, 16885, 10800,
20395, 18752, 17369, 21940, 14893, 14939, 25153, 19275, 15273,
18337, 18835, 17250, 26872, 15279, 14366, 15319, 20846, 15711,
18547, 20289, 22089, 17250, 18777, 21723, 17813, 21230, 24460, 8375,
14843, 18409, 4854, 10552, 13598, 14440, 14707, 17834, 18916,
22908, 7045, 20264, 20317, 6742, 8589, 15747, 17136, 12764, 18185,
6882, 8867, 7009, 13119, 10461, 11362, 14844, 14337, 9780, 7170,
8486, 8538, 8758, 8383, 5024, 7285, 10365, 5239, 7644, 8675, 7909,
8781, 7353, 6439, 9123, 8136, 11655, 18012, 8834, 11400, 8248, 8207,
9232, 11126, 24912, 12578, 8352, 13299, 6344, 8347, 6876, 14591,
11316, 18416, 11233, 8438, 20095, 10800, 7596, 5791, 7083, 7931,
6021, 6088, 13472, 9212, 6992, 8428, 9336, 11558, 10948, 8795, 6353,
11253, 9172, 15023, 6512, 7775, 11892, 7908, 7545, 8135, 10378,
8896, 7302, 12794, 10991, 10490, 7240, 9780, 4285, 4694, 6847, 9383,
6969, 7879, 12737, 5840, 5550, 12252, 9034, 8661, 10347, 11444,
8241, 11445, 11539, 14462, 17701, 13711, 8229, 7458, 12440, 13455,
12092, 13517, 12047, 10099, 18228, 14068, 17192, 18021, 12252,
11070, 11711, 12952, 12144, 9109, 6563, 4531, 7438, 8839, 15560,
11478, 18469, 14584}, {35494, 32082, 27490, 29077, 31458, 31198}}
My second try was to specify the number of clusters using FindClusters[list,4]
. The output was:
{{4169, 7114, 5025, 7316, 4977, 10411, 9352, 16438, 8719, 14330,
10277, 7144, 11950, 18572, 10471, 4915, 4958, 7556, 5145, 13862,
8466, 14138, 10861, 11815, 5638, 15242, 16666, 23564, 4256, 13014,
9865, 3729, 5980, 7740, 14290, 14067, 12038, 14125, 6436, 14240,
19054, 9622, 13876, 8362, 5983, 7163, 4908, 12856, 15923, 14368,
14467, 9393, 9555, 8537, 9149, 10272, 8228, 6525, 6596, 10401, 6244,
16576, 15262, 12593, 16128, 13189, 13508, 14206, 15115, 19442,
18195, 14522, 9103, 8781, 9394, 4716, 6760, 9281, 6958, 10581,
10862, 11518, 11508, 5691, 8567, 9797, 10897, 9535, 8723, 7645,
7035, 7186, 7392, 6913, 7549, 18990, 12778, 15982, 5145, 14650,
14468, 13480, 20918, 14713, 17319, 22983, 20166, 9464, 23675, 8466,
9598, 9698, 7082, 18233, 15193, 11804, 10285, 17428, 11320, 6441,
11868, 14666, 18505, 11778, 12131, 9275, 6347, 13024, 19351, 14984,
14150, 18093, 7455, 20572, 14041, 23137, 12763, 14986, 11280, 13584,
17583, 14394, 17540, 18123, 16960, 9344, 20265, 21251, 19206,
17411, 17123, 17137, 11778, 19055, 15926, 18753, 19731, 14524,
21106, 12309, 12357, 17689, 23076, 20067, 10224, 16353, 7571, 8493,
8927, 15024, 18869, 14585, 16099, 18462, 14361, 15621, 15584, 20522,
18542, 13220, 19124, 16885, 10800, 20395, 18752, 17369, 21940,
14893, 14939, 19275, 15273, 18337, 18835, 17250, 15279, 14366,
15319, 20846, 15711, 18547, 20289, 22089, 17250, 18777, 21723,
17813, 21230, 24460, 8375, 14843, 18409, 4854, 10552, 13598, 14440,
14707, 17834, 18916, 22908, 7045, 20264, 20317, 6742, 8589, 15747,
17136, 12764, 18185, 6882, 8867, 7009, 13119, 10461, 11362, 14844,
14337, 9780, 7170, 8486, 8538, 8758, 8383, 5024, 7285, 10365, 5239,
7644, 8675, 7909, 8781, 7353, 6439, 9123, 8136, 11655, 18012, 8834,
11400, 8248, 8207, 9232, 11126, 12578, 8352, 13299, 6344, 8347,
6876, 14591, 11316, 18416, 11233, 8438, 20095, 10800, 7596, 5791,
7083, 7931, 6021, 6088, 13472, 9212, 6992, 8428, 9336, 11558, 10948,
8795, 6353, 11253, 9172, 15023, 6512, 7775, 11892, 7908, 7545,
8135, 10378, 8896, 7302, 12794, 10991, 10490, 7240, 9780, 4285,
4694, 6847, 9383, 6969, 7879, 12737, 5840, 5550, 12252, 9034, 8661,
10347, 11444, 8241, 11445, 11539, 14462, 17701, 13711, 8229, 7458,
12440, 13455, 12092, 13517, 12047, 10099, 18228, 14068, 17192,
18021, 12252, 11070, 11711, 12952, 12144, 9109, 6563, 4531, 7438,
8839, 15560, 11478, 18469, 14584}, {35494}, {24985, 25290, 25316,
27490, 25153, 29077, 26872, 24912}, {32082, 31458, 31198}}
Could you explain me how this function works? I don't want to have a huge cluster with most of the values. Instead, I expect that the function recognises a cluster for values near 10k, 15k, 20k and 30k. What is the distance function used in FindingClusters()?