Group Abstract

Message Boards

WOLFRAM COMMUNITY

4.8K Views

5 Replies

0 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Data Science Mathematics Physics Wolfram Language Statistics and Probability Machine Learning

Find clusters in an energy spectrum?

Andrea G.

Andrea G., research

Posted 7 years ago

Hi ! I have recently been faced with a problem on spectral analysis which may find a solution with data clustering techniques. I have an energy spectrum with around 200 energy levels: many of them separated by less or comparable to the error on energy, and so I would like to cluster them to consider the levels very close in energy (with respect to their errors) as only one. Data are organised in a table of four columns where the first one is the energy, the second and third are level intensity and its errors (irrelevant for the clustering),and the fourth is the error on energy. level52={{6430, 0.93, 0.0808, 12}, {6452, 0.56, 0.112, 13}, {6485, 2.03, \ 0.0848, 15}, {6531, 0.78, 0.0579, 18}, {6584, 0.56, 0.0488, 21}, \ {6659, 0.83, 0.0483, 25}} etc... I therefore tried: FindClusters[Drop[level52, None, {2, 3, 4}] -> level52, 40] but this does not cluster levels properly. Does anyone know what it if the appropriate distance method to use in the case ? Is there a way to define a distance function using the error on energy on the fourth column of the table ? For example a distance which is the difference in energy between the levels divided by the sum of their errors. Many thanks for the help.

POSTED BY: Andrea G.

5 Replies

Sort By:

Hans Dolhaine

Hans Dolhaine, retired

Posted 7 years ago

I devised a new distance-function df1 which is zero when the energies are equal, 1 when uncertainety-intervals overlap and the absolute value of the energy-difference otherwise. df1[xx_, yy_] := Module[{}, x = xx; y = yy; j1 = Interval[{x[[1]] - x[[4]], x[[1]] + x[[4]]}]; j2 = Interval[{y[[1]] - y[[4]], y[[1]] + y[[4]]}]; ds = IntervalIntersection[j1, j2]; Which[ x[[1]] == y[[1]], 0, ds === Interval[], Abs[x[[1]] - y[[1]]], True, 1] ] This distancefunction applied to the 2-subsets of level52 shows that two energies are "similar", i.e. having distance = 1, and the others not: In[17]:= df1 @@@ (Subsets[level52, {2}]) Out[17]= {1, 55, 101, 154, 229, 33, 79, 132, 207, 46, 99, 174, 53, 128, 75} So level52 should be clustered, what surprisingly is not the case: In[15]:= FindClusters[level52, DistanceFunction -> (df1[#1, #2] &)] % // Length Out[15]= {{{6430, 0.93, 0.0808, 12}, {6452, 0.56, 0.112, 13}, {6485, 2.03, 0.0848, 15}, {6531, 0.78, 0.0579, 18}, {6584, 0.56, 0.0488, 21}, {6659, 0.83, 0.0483, 25}}} Out[16]= 1 So I decided to do it on my own, and - as expected - find in level52 two clusters. One of two "similar energy-levels, and the rest of all others In[29]:= len = Length[level52]; pairs = Flatten[Table[{level52[[i]], level52[[j]]}, {i, len - 1}, {j, i + 1, len}],1]; Select[pairs, df1[#[[1]], #[[2]]] == 1 &] Out[31]= {{{6430, 0.93, 0.0808, 12}, {6452, 0.56, 0.112, 13}}}

I devised a new distance-function df1 which is zero when the energies are equal, 1 when uncertainety-intervals overlap and the absolute value of the energy-difference otherwise.

df1[xx_, yy_] := Module[{},
  x = xx; y = yy;
  j1 = Interval[{x[[1]] - x[[4]], x[[1]] + x[[4]]}];
  j2 = Interval[{y[[1]] - y[[4]], y[[1]] + y[[4]]}];
  ds = IntervalIntersection[j1, j2];
  Which[
   x[[1]] == y[[1]], 0,
   ds === Interval[], Abs[x[[1]] - y[[1]]],
   True, 1]
  ]

This distancefunction applied to the 2-subsets of level52 shows that two energies are "similar", i.e. having distance = 1, and the others not:

In[17]:= df1 @@@ (Subsets[level52, {2}])

Out[17]= {1, 55, 101, 154, 229, 33, 79, 132, 207, 46, 99, 174, 53, 128, 75}

So level52 should be clustered, what surprisingly is not the case:

 In[15]:= FindClusters[level52, DistanceFunction -> (df1[#1, #2] &)]
% // Length

Out[15]= {{{6430, 0.93, 0.0808, 12}, {6452, 0.56, 0.112, 13}, {6485, 
   2.03, 0.0848, 15}, {6531, 0.78, 0.0579, 18}, {6584, 0.56, 0.0488, 
   21}, {6659, 0.83, 0.0483, 25}}}

Out[16]= 1

So I decided to do it on my own, and - as expected - find in level52 two clusters. One of two "similar energy-levels, and the rest of all others

In[29]:= len = Length[level52];
pairs = Flatten[Table[{level52[[i]], level52[[j]]}, {i, len - 1}, {j, i + 1, len}],1];
Select[pairs, df1[#[[1]], #[[2]]] == 1 &]

Out[31]= {{{6430, 0.93, 0.0808, 12}, {6452, 0.56, 0.112, 13}}}

POSTED BY: Hans Dolhaine

Hans Dolhaine

Hans Dolhaine, retired

Posted 7 years ago

I define a distance-function according to your proposal (although I don't see why you divide by the sum of errors: great errors make great differences in energy small) For example a distance which is the difference in energy between the levels divided by the sum of their errors df[x_, y_] := Abs[(x[[1]] - y[[1]])/(x[[4]] + y[[4]])] Unfortunately this does not give a clustering of level52 In[68]:= FindClusters[level52, DistanceFunction -> (df[#1, #2] &)] % // Length Out[68]= {{{6430, 0.93, 0.0808, 12}, {6452, 0.56, 0.112, 13}, {6485, 2.03, 0.0848, 15}, {6531, 0.78, 0.0579, 18}, {6584, 0.56, 0.0488, 21}, {6659, 0.83, 0.0483, 25}}} Out[69]= 1 Interesting enough applying this distance-function to each pair of elements (produced by the Subsets-command) in level52 gives a set of numbers (distances), which are clustered in two parts: In[83]:= Apply[df, Subsets[level52, {2}], {1}] // N FindClusters[%] Out[83]= {0.88, 2.03704, 3.36667, 4.66667, 6.18919, 1.17857, 2.54839, \ 3.88235, 5.44737, 1.39394, 2.75, 4.35, 1.35897, 2.97674, 1.63043} Out[84]= {{0.88, 2.03704, 1.17857, 2.54839, 1.39394, 2.75, 1.35897, 2.97674, 1.63043}, {3.36667, 4.66667, 6.18919, 3.88235, 5.44737, 4.35}} But does df work at all in the Clustering-command? If I produce a new set of data named level52a where there are added five datasets with an energy enhanced by 450 units the distance-function works: level52a is clustered in two sets consisting of 6 (the original data) and 5 (the new data) elements. In[114]:= level52a = Join[level52, # + {450, 0, 0, 0} & /@ Take[level52, 5]]; FindClusters[level52a, DistanceFunction -> (df[#1, #2] &)] Length /@ % Out[115]= {{{6430, 0.93, 0.0808, 12}, {6452, 0.56, 0.112, 13}, {6485, 2.03, 0.0848, 15}, {6531, 0.78, 0.0579, 18}, {6584, 0.56, 0.0488, 21}, {6659, 0.83, 0.0483, 25}}, {{6880, 0.93, 0.0808, 12}, {6902, 0.56, 0.112, 13}, {6935, 2.03, 0.0848, 15}, {6981, 0.78, 0.0579, 18}, {7034, 0.56, 0.0488, 21}}} Out[116]= {6, 5} Obviously df is not appropriate to cluster level 52a.

I define a distance-function according to your proposal (although I don't see why you divide by the sum of errors: great errors make great differences in energy small)

For example a distance which is the difference in energy between the levels divided by the sum of their errors

df[x_, y_] := Abs[(x[[1]] - y[[1]])/(x[[4]] + y[[4]])]

Unfortunately this does not give a clustering of level52

In[68]:= FindClusters[level52, DistanceFunction -> (df[#1, #2] &)]
% // Length

Out[68]= {{{6430, 0.93, 0.0808, 12}, {6452, 0.56, 0.112, 13}, {6485, 
   2.03, 0.0848, 15}, {6531, 0.78, 0.0579, 18}, {6584, 0.56, 0.0488, 
   21}, {6659, 0.83, 0.0483, 25}}}

Out[69]= 1

Interesting enough applying this distance-function to each pair of elements (produced by the Subsets-command) in level52 gives a set of numbers (distances), which are clustered in two parts:

In[83]:= Apply[df, Subsets[level52, {2}], {1}] // N
FindClusters[%]

Out[83]= {0.88, 2.03704, 3.36667, 4.66667, 6.18919, 1.17857, 2.54839, \
3.88235, 5.44737, 1.39394, 2.75, 4.35, 1.35897, 2.97674, 1.63043}

Out[84]= {{0.88, 2.03704, 1.17857, 2.54839, 1.39394, 2.75, 1.35897, 
  2.97674, 1.63043}, {3.36667, 4.66667, 6.18919, 3.88235, 5.44737, 
  4.35}}

But does df work at all in the Clustering-command?

If I produce a new set of data named level52a where there are added five datasets with an energy enhanced by 450 units the distance-function works: level52a is clustered in two sets consisting of 6 (the original data) and 5 (the new data) elements.

In[114]:= 
level52a = Join[level52, # + {450, 0, 0, 0} & /@ Take[level52, 5]];
FindClusters[level52a, DistanceFunction -> (df[#1, #2] &)]
Length /@ %

Out[115]= {{{6430, 0.93, 0.0808, 12}, {6452, 0.56, 0.112, 13}, {6485, 
   2.03, 0.0848, 15}, {6531, 0.78, 0.0579, 18}, {6584, 0.56, 0.0488, 
   21}, {6659, 0.83, 0.0483, 25}}, {{6880, 0.93, 0.0808, 12}, {6902, 
   0.56, 0.112, 13}, {6935, 2.03, 0.0848, 15}, {6981, 0.78, 0.0579, 
   18}, {7034, 0.56, 0.0488, 21}}}

Out[116]= {6, 5}

Obviously df is not appropriate to cluster level 52a.

POSTED BY: Hans Dolhaine

Hans Dolhaine

Hans Dolhaine, retired

Posted 7 years ago

Sorry, but it is by no means clear what you actually want to achieve. What should your Drop[level52, None, {2, 3, 4}] -> level52 do? In my system this just makes a Rule. Look at FullForm [ Drop[level52, None, {2, 3, 4}] -> level52 ] And then? What should be done with this rule? Or do you want to eliminate some entries of level52? And if yes, which? Is level52 one of your 200 or so energy-levels. Or do you want to cluster within level52? It might be helpful if you gave more details, or a written example (in a notebook) what your intentions are. E.g. ten or 15 energy levels in a list, and the result you want to have.

POSTED BY: Hans Dolhaine

Hans Dolhaine

Hans Dolhaine, retired

Posted 7 years ago

What do you expect? Perhaps this? FindClusters[#[[1]] & /@ level52]

POSTED BY: Hans Dolhaine

Andrea G.

Andrea G., research

Posted 7 years ago

Well, I would expect something like that: FindClusters[{ {6430, 0.93, 0.0808, 12}, {6452, 0.56, 0.1120, 13}, {6485, 2.03, 0.0848, 15}},\\ first cluster, energies compatible within error {{6531, 0.78, 0.0579, 18}, {6584, 0.56, 0.0488, 21}},\\ second cluster, energies compatible within error {{6659, 0.83, 0.0483, 25}} \\third cluster: energy not compatible within errors with others energies }

Well, I would expect something like that:

FindClusters[{ {6430, 0.93, 0.0808, 12}, {6452, 0.56, 0.1120, 13}, {6485, 2.03, 0.0848, 15}},\\ first cluster, energies compatible within error 
   {{6531, 0.78, 0.0579, 18}, {6584, 0.56, 0.0488, 21}},\\ second cluster, energies compatible within error {{6659, 0.83, 0.0483, 25}} \\third cluster: energy not compatible within errors with others energies 
   }

POSTED BY: Andrea G.

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback