# Make vector/mutlivariate clustering/machine learning of a dataset?

Posted 5 months ago
594 Views
|
5 Replies
|
3 Total Likes
|
 Hi, Does anyone know how to perform a vector level clustering of a data set ? The problem to solve: I have a data set of 1450 samples. Each sample is a vector with 10 scalar data (numbers). The data is structured in a matrix, i.e. a list of lists of numbers. {{1,2,1,...},{3,1,7,...}...} When I use the function Find Clusters, it returns a classification of the scalars themselves, i.e. each number, but not of the vectors. I want to be able to classify the vectors {1,2,1,...} as single objects, as opposed to each scalar component, which is what Mathematica does wenn I call the function Findclusters on the matrix itself. Does anyone know how to proceed to do this ? Thanks a lot for the answer. Best, Emmanuel.
5 Replies
Sort By:
Posted 5 months ago
 FindClusters handles vector input. With no concrete example posted it is difficult to diagnose what may have gone wrong. Below is a simple (rigged) example that shows clustering of vectors. We use three sets created each as a separate cluster. SeedRandom[134]; n = 20; d = 4; data = Join[RandomReal[{-1, 1}, {n, d}], RandomReal[{-3, -1}, {n, d}], RandomReal[{1, 3}, {n, d}]]; FindClusters[data -> Range[Length[data]]] (* Out[617]= {{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}, {21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40}, {41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60}} *) 
Posted 5 months ago
 Hi Daniel, Thank you for the reply. It seemed to work. The clustering was made to vectors, by using the FindCluster function. However, I have not managed to associate the cluster numbers to the vectors, by using the ClusteringComponents function. What I am trying to do is to associate the cluster number to each element of the list. Here is a sample of the data set (called sublistDealers) to sample: {{58., 60., 58.}, {61., 65., 61.}, {55., 55., 61.}, {58., 54., 53.}, {63., 65., 67.}, {58., 58., 60.}, {58., 55., 57.}, {54., 64., 63.}, {43., 44., 43.}, {64., 65., 59.}, {51., 54., 48.}, {3., 3., 5.}, {62., 63., 61.}, {54., 52., 53.}, {56., 57., 59.}, {62., 60., 61.}, {46., 46., 47.}, {50., 54., 52.}, {52., 55., 54.}, {60., 57., 59.}, {55., 52., 55.}, {53., 54., 53.}, {51., 53., 56.}, {50., 48., 53.}, {54., 56., 57.}, {50., 52., 51.}, {57., 53., 56.}, {59., 56., 62.}, {45., 49., 47.}, {43., 49., 46.}, {51., 57., 56.}, {46., 44., 51.}, {53., 56., 51.}, {49., 52., 55.}, {46., 48., 51.}, {50., 48., 49.}, {51., 56., 54.}, {37., 45., 44.}, {49., 51., 48.}, {49., 45., 49.}, {42., 47., 42.}, {54., 52., 43.}, {49., 45., 48.}, {53., 52., 51.}, {44., 43., 41.}, {49., 46., 44.}, {47., 46., 50.}, {33., 38., 43.}, {47., 52., 50.}, {36., 31., 36.}, {30., 26., 36.}, {49., 49., 47.}, {44., 45., 46.}, {33., 42., 46.}, {33., 41., 44.}, {45., 47., 48.}, {36., 43., 45.}, {35., 38., 39.}, {50., 55., 48.}, {39., 48., 43.}, {54., 48., 49.}, {39., 38., 37.}, {50., 44., 47.}, {42., 38., 35.}, {41., 43., 50.}, {41., 44., 45.}, {34., 30., 34.}, {43., 47., 45.}, {53., 49., 49.}, {53., 58., 51.}, {8., 3., 3.}, {50., 49., 46.}, {53., 56., 47.}, {50., 47., 49.}, {23., 25., 45.}, {33., 39., 42.}, {43., 49., 45.}, {40., 42., 45.}, {45., 45., 43.}, {46., 41., 46.}, {51., 50., 47.}, {43., 41., 46.}, {42., 48., 40.}, {38., 38., 38.}, {28., 31., 29.}, {38., 42., 43.}, {51., 45., 46.},...}clusteredDealers = FindClusters[sublistDealers, 2] provides the following list (which is correct): {{{58., 60., 58.}, {61., 65., 61.}, {55., 55., 61.}, {58., 54., 53.}, {63., 65., 67.}, {58., 58., 60.}, {58., 55., 57.}, {54., 64., 63.}, {43., 44., 43.}, {64., 65., 59.}, {51., 54., 48.}, {62., 63., 61.}, {54., 52., 53.}, {56., 57., 59.}, {62., 60., 61.}, {46., 46., 47.}, {50., 54., 52.}, {52., 55., 54.}, {60., 57., 59.}, {55., 52., 55.}, {53., 54., 53.}, {51., 53., 56.}, {50., 48., 53.}, {54., 56., 57.}, {50., 52., 51.}, {57., 53., 56.}, {59., 56., 62.}, {45., 49., 47.}, {43., 49., 46.}, {51., 57., 56.}, {46., 44., 51.}, {53., 56., 51.}, {49., 52., 55.}, {46., 48., 51.}, {50., 48., 49.}, {51., 56., 54.}, {37., 45., 44.}, {49., 51., 48.}, {49., 45., 49.}, {42., 47., 42.}, {54., 52., 43.}, {49., 45., 48.}, {53., 52., 51.}, {44., 43., 41.}, {49., 46., 44.}, {47., 46., 50.}, {33., 38., 43.}, {47., 52., 50.}, {36., 31., 36.}, {30., 26., 36.}, {49., 49., 47.}, {44., 45., 46.}, {33., 42., 46.}, {33., 41., 44.}, {45., 47., 48.}, {36., 43., 45.}, {35., 38., 39.}, {50., 55., 48.}, {39., 48., 43.}, {54., 48., 49.}, {39., 38., 37.}, {50., 44., 47.}, {42., 38., 35.}, {41., 43., 50.}, {41., 44., 45.}, {34., 30., 34.}, {43., 47., 45.}, {53., 49., 49.}, {53., 58., 51.}, {50., 49., 46.}, {53., 56., 47.}, {50., 47., 49.}, {23., 25., 45.}, {33., 39., 42.}, {43., 49., 45.}, {40., 42., 45.}, {45., 45., 43.}, {46., 41., 46.}, {51., 50., 47.}, {43., 41., 46.}, {42., 48., 40.}, {38., 38., 38.}, {28., 31., 29.}, {38., 42., 43.}, {51., 45., 46.}, {37., 39., 41.}, {31., 40., 41.}, {51., 48., 44.}, {39., 42., 41.}, {41., 37., 42.}, {47., 45., 47.}, {46., 41., 40.}, {38., 44., 41.}, {26., 33., 37.}, {39., 48., 48.}, {47., 47., 47.}, {47., 48., 44.}, {43., 44., 40.}, {46., 48., 41.}, {43., 46., 47.}, {46., 57., 43.}, {37., 35., 41.}, {34., 37., 43.}, {45., 42., 42.}, {45., 46., 43.}, {36., 42., 36.},...} However, using the ClusteringComponents[sublistDealers, 2] call, it seems to deliver a clustering of the scalars themselves: Here are some sample elements that I get from the list: {{1, 1, 1}, {1, 1, 1}, {1, 1, 1}, {1, 1, 1}, {1, 1, 1}, {1, 1, 1}, {1, \ 1, 1}, {1, 1, 1}, {1, 1, 1}, {1, 1, 1}, {1, 1, 1}, {1, 1, 1}, {1, 1, \ 1}, {1, 1, 1}, {1, 1, 1}, {1, 1, 1}, {1, 1, 1}, {1, 1, 1}, {2, 2, 1}, \ {1, 1, 1}, {1, 1, 1}, {1, 1, 1}, {2, 1, 1}, {1, 1, 2}, {1, 1, 1}, {1, \ 1, 1}, {1, 1, 1}, {1, 1, 1}, {1, 1, 1}, {1, 1, 1}, {1, 1, 1}, {1, 1, \ 1}, {1, 1, 1}, {1, 1, 1}, {1, 1, 1}, {1, 1, 1}, {1, 2, 1}, {2, 2, 1},...} Thank you for answer. Best, Emmanuel
Posted 5 months ago
 For that one uses the optional level argument to ClusteringComponents. Which I guess could have been better documented-- I had to go a ways down into the examples to find out it did what is wanted here. sublistDealers = {{58., 60., 58.}, {61., 65., 61.}, {55., 55., 61.}, {58., 54., 53.}, {63., 65., 67.}, {58., 58., 60.}, {58., 55., 57.}, {54., 64., 63.}, {43., 44., 43.}, {64., 65., 59.}, {51., 54., 48.}, {3., 3., 5.}, {62., 63., 61.}, {54., 52., 53.}, {56., 57., 59.}, {62., 60., 61.}, {46., 46., 47.}, {50., 54., 52.}, {52., 55., 54.}, {60., 57., 59.}, {55., 52., 55.}, {53., 54., 53.}, {51., 53., 56.}, {50., 48., 53.}, {54., 56., 57.}, {50., 52., 51.}, {57., 53., 56.}, {59., 56., 62.}, {45., 49., 47.}, {43., 49., 46.}, {51., 57., 56.}, {46., 44., 51.}, {53., 56., 51.}, {49., 52., 55.}, {46., 48., 51.}, {50., 48., 49.}, {51., 56., 54.}, {37., 45., 44.}, {49., 51., 48.}, {49., 45., 49.}, {42., 47., 42.}, {54., 52., 43.}, {49., 45., 48.}, {53., 52., 51.}, {44., 43., 41.}, {49., 46., 44.}, {47., 46., 50.}, {33., 38., 43.}, {47., 52., 50.}, {36., 31., 36.}, {30., 26., 36.}, {49., 49., 47.}, {44., 45., 46.}, {33., 42., 46.}, {33., 41., 44.}, {45., 47., 48.}, {36., 43., 45.}, {35., 38., 39.}, {50., 55., 48.}, {39., 48., 43.}, {54., 48., 49.}, {39., 38., 37.}, {50., 44., 47.}, {42., 38., 35.}, {41., 43., 50.}, {41., 44., 45.}, {34., 30., 34.}, {43., 47., 45.}, {53., 49., 49.}, {53., 58., 51.}, {8., 3., 3.}, {50., 49., 46.}, {53., 56., 47.}, {50., 47., 49.}, {23., 25., 45.}, {33., 39., 42.}, {43., 49., 45.}, {40., 42., 45.}, {45., 45., 43.}, {46., 41., 46.}, {51., 50., 47.}, {43., 41., 46.}, {42., 48., 40.}, {38., 38., 38.}, {28., 31., 29.}, {38., 42., 43.}, {51., 45., 46.}}; cc=ClusteringComponents[sublistDealers, 2, 1] (* Out[700]= {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, \ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \ 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1} *) QuIck check: In[703]:= Extract[sublistDealers, Position[cc, 2]] (* Out[703]= {{3., 3., 5.}, {8., 3., 3.}} *) This is in fact the second cluster provided by FindClusters[sublistDealers, 2].
 The first is quite simple. if I follow correctly what you want. Starting with the computation I already showed, the appending is done as below. Transpose[{sublistDealers, cc}] (* {{{58., 60., 58.}, 1}, {{61., 65., 61.}, 1}, {{55., 55., 61.}, 1}, {{58., 54., 53.}, 1}, {{63., 65., 67.}, 1}, {{58., 58., 60.}, 1}, {{58., 55., 57.}, 1}, {{54., 64., 63.}, 1}, {{43., 44., 43.}, 1}, {{64., 65., 59.}, 1}, {{51., 54., 48.}, 1}, {{3., 3., 5.}, 2}, {{62., 63., 61.}, 1}, {{54., 52., 53.}, 1}, {{56., 57., 59.}, 1}, {{62., 60., 61.}, 1}, {{46., 46., 47.}, 1}, {{50., 54., 52.}, 1}, {{52., 55., 54.}, 1}, {{60., 57., 59.}, 1}, {{55., 52., 55.}, 1}, {{53., 54., 53.}, 1}, {{51., 53., 56.}, 1}, {{50., 48., 53.}, 1}, {{54., 56., 57.}, 1}, {{50., 52., 51.}, 1}, {{57., 53., 56.}, 1}, {{59., 56., 62.}, 1}, {{45., 49., 47.}, 1}, {{43., 49., 46.}, 1}, {{51., 57., 56.}, 1}, {{46., 44., 51.}, 1}, {{53., 56., 51.}, 1}, {{49., 52., 55.}, 1}, {{46., 48., 51.}, 1}, {{50., 48., 49.}, 1}, {{51., 56., 54.}, 1}, {{37., 45., 44.}, 1}, {{49., 51., 48.}, 1}, {{49., 45., 49.}, 1}, {{42., 47., 42.}, 1}, {{54., 52., 43.}, 1}, {{49., 45., 48.}, 1}, {{53., 52., 51.}, 1}, {{44., 43., 41.}, 1}, {{49., 46., 44.}, 1}, {{47., 46., 50.}, 1}, {{33., 38., 43.}, 1}, {{47., 52., 50.}, 1}, {{36., 31., 36.}, 1}, {{30., 26., 36.}, 1}, {{49., 49., 47.}, 1}, {{44., 45., 46.}, 1}, {{33., 42., 46.}, 1}, {{33., 41., 44.}, 1}, {{45., 47., 48.}, 1}, {{36., 43., 45.}, 1}, {{35., 38., 39.}, 1}, {{50., 55., 48.}, 1}, {{39., 48., 43.}, 1}, {{54., 48., 49.}, 1}, {{39., 38., 37.}, 1}, {{50., 44., 47.}, 1}, {{42., 38., 35.}, 1}, {{41., 43., 50.}, 1}, {{41., 44., 45.}, 1}, {{34., 30., 34.}, 1}, {{43., 47., 45.}, 1}, {{53., 49., 49.}, 1}, {{53., 58., 51.}, 1}, {{8., 3., 3.}, 2}, {{50., 49., 46.}, 1}, {{53., 56., 47.}, 1}, {{50., 47., 49.}, 1}, {{23., 25., 45.}, 1}, {{33., 39., 42.}, 1}, {{43., 49., 45.}, 1}, {{40., 42., 45.}, 1}, {{45., 45., 43.}, 1}, {{46., 41., 46.}, 1}, {{51., 50., 47.}, 1}, {{43., 41., 46.}, 1}, {{42., 48., 40.}, 1}, {{38., 38., 38.}, 1}, {{28., 31., 29.}, 1}, {{38., 42., 43.}, 1}, {{51., 45., 46.}, 1}} *) I do not know of a way to determine the method used, assuming one goes with the Automatic default. OneClusteringComponents could force a method using the option though. The ClusteringComponents ref guide page gives a set of possibilities.