Group Abstract

Message Boards

WOLFRAM COMMUNITY

12.9K Views

10 Replies

9 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Relation between Principal Components

David Thissen

David Thissen, Freiburg

Posted 11 years ago

Hi all, I'm relatively new to Mathematica and working with PCA (Principal Component Analysis). From biological experiments on the foraging of nematodes I've collected some data points (approx 200) of 100 dimensions each. The data is stored in a matrix, 100 rows and approx. 200 columns. Now, to find the major aspects of the foraging movement, I've done a PCA on the data: pca=PrincipalComponents[data] Great! I've got the principal components, of which the first four are the most relevant, explaining the lion's share of the total variance. From theoretical considerations, I assume that PC1 and PC2 amplitudes are related as follows (idealized, of course): Now, how can I plot something like that? First, I would need a list of amplitudes a1, a2, a3, a4 for the first four Principal Components and for all 200 data points. Then I somehow need to plot the relation Plot[a2[a1], {a1,-.2,.2}] Any ideas on how to get the amplitudes and how to plot them? . Thanks, any input would help a lot! David.

POSTED BY: David Thissen

10 Replies

Sort By:

Daniel Sumner Magruder

Posted 9 years ago

Can someone please explain this to me? Running PCA in SPSS spits out the component scores for each variable. How can I get something that looks similar?

POSTED BY: Daniel Sumner Magruder

Matthias Odisio

Matthias Odisio, Thermo Fisher Scientific

Posted 11 years ago

David, If by amplitudes you were looking for the vector `v` above, then you are looking for the second argument returned by `KarhunenLoeveTransform`. Using the variable names from Jim's example, `v` and `Last@KarhunenLoeveDecomposition[Transpose@z]` are identical up to the sign.

POSTED BY: Matthias Odisio

David Thissen

David Thissen, Freiburg

Posted 11 years ago

Thanks, Jim! Exactly what I was looking for! What I meant by "amplitudes" is basically contained in vector v... Thank you all and best wishes from the wintery snowy south of Germany, David.

POSTED BY: David Thissen

Jim Baldwin

Jim Baldwin, Retired

Posted 11 years ago

And to go from a principal component score back to the world of the original data one could use pcx.Transpose[v]*dataSD + dataMean where pcx is a vector of principal component scores.

POSTED BY: Jim Baldwin

Jim Baldwin

Jim Baldwin, Retired

Posted 11 years ago

Here is how you can obtain the associated principal components score for a new vector of data: (* Data from Mathematica documentation on PrincipalComponents ) data = {{13.2, 200, 58, 21.2}, {10, 263, 48, 44.5}, {8.1, 294, 80, 31}, {8.8, 190, 50, 19.5}, {9, 276, 91, 40.6}, {7.9, 204, 78, 38.7}, {3.3, 110, 77, 11.1}, {5.9, 238, 72, 15.8}, {15.4, 335, 80, 31.9}, {17.4, 211, 60, 25.8}}; ( Get principal component scores from high level function ) pc = PrincipalComponents[data, Method -> Correlation]; ( Get information needed to transform additional vectors in the same \ way ) dataMean = Mean[data]; dataSD = StandardDeviation[data]; ( Standardize original data ) z = (# - dataMean)/dataSD & /@ data; ( Get principal components and other necessary info the hard way ) {u, s, v} = SingularValueDecomposition[z]; ( But note that pc and z.v are "essentially the same" except that \ the signs of some columns might be reversed, i.e., principal \ components are not unique with respect to sign ) MatrixForm[pc] MatrixForm[z.v] ( So from here on we use z.v rather than pc because we need the \ other pieces generated by SingularValueDecomposition ) ( Say we have a new vector of values that we want to transform to \ its corresponding principal component score ) ( I've used just the first row of the data to show that one gets the \ correct results - one never has enough QA ) x = {13.2, 200, 58, 21.2}; ( Standardize x with the data mean and standard deviation ) zx = (x - dataMean)/dataSD; ( Obtain principal components score for x *) pcx = zx.v So that will obtain the principal components for any new data. Like Matthias, I have not seen the term "amplitude" associated with principal component theory. Is that a particular subject matter jargon?

Here is how you can obtain the associated principal components score for a new vector of data:

(* Data from Mathematica documentation on PrincipalComponents *)
data = {{13.2, 200, 58, 21.2}, {10, 263, 48, 44.5}, {8.1, 294, 80, 
    31}, {8.8, 190, 50, 19.5}, {9, 276, 91, 40.6}, {7.9, 204, 78, 
    38.7}, {3.3, 110, 77, 11.1}, {5.9, 238, 72, 15.8}, {15.4, 335, 80,
     31.9}, {17.4, 211, 60, 25.8}};

(* Get principal component scores from high level function *)
pc = PrincipalComponents[data, Method -> Correlation];

(* Get information needed to transform additional vectors in the same \
way *)
dataMean = Mean[data];
dataSD = StandardDeviation[data];

(* Standardize original data *)
z = (# - dataMean)/dataSD & /@ data;

(* Get principal components and other necessary info the hard way *)
{u, s, v} = SingularValueDecomposition[z];
(* But note that pc and z.v are "essentially the same" except that \
the signs of some columns might be reversed, i.e., principal \
components are not unique with respect to sign *)
MatrixForm[pc]
MatrixForm[z.v]
(* So from here on we use z.v rather than pc because we need the \
other pieces generated by SingularValueDecomposition *)

(* Say we have a new vector of values that we want to transform to \
its corresponding principal component score *)
(* I've used just the first row of the data to show that one gets the \
correct results - one never has enough QA *)
x = {13.2, 200, 58, 21.2};

(* Standardize x with the data mean and standard deviation *)
zx = (x - dataMean)/dataSD;

(* Obtain principal components score for x *)
pcx = zx.v

So that will obtain the principal components for any new data. Like Matthias, I have not seen the term "amplitude" associated with principal component theory. Is that a particular subject matter jargon?

POSTED BY: Jim Baldwin

Matthias Odisio

Matthias Odisio, Thermo Fisher Scientific

Posted 11 years ago

I don't know the terminology you are using. What do you mean by amplitude? Can you point to a formula?

POSTED BY: Matthias Odisio

David Thissen

David Thissen, Freiburg

Posted 11 years ago

Thanks, Matthias, for your reply! Yes, I played around with KarhunenLoeveDecomposition as well a little bit. But basically, I don't see any difference between KarhunenLoeveDecomposition and PrincipalComponents. The two functions seem to be exactly equivalent, except that the data has to be transposed for KarhunenLoeve... KarhunenLoeveDecomposition also gives a matrix of "eigenvalues", but they are unfortunately not the amplitudes I'm looking for. They are the eigenvalues of the covariance matrix for the data...

POSTED BY: David Thissen

Matthias Odisio

Matthias Odisio, Thermo Fisher Scientific

Posted 11 years ago

David, Did you have a look at `KarhunenLoeveDecomposition`? It also provides the transformation matrix between the data and the principal components.

POSTED BY: Matthias Odisio

David Thissen

David Thissen, Freiburg

Posted 11 years ago

Thanks, Jim! this will plot the first two principal component vectors, PC1 and PC2. What I need is the amplitudes. In other words: What do I need to multiply the principal component vectors with to get my original data points. data[[All,1]] = (a1)_1PC1 + (a2)_1PC2 + (a3)_1*PC3 + ... I need that for data[[All,1]], data[[All,2]], data[[All,3]], ... In the end, I want something like Plot[a2[a1], {a1,-.2,.2}] . Unfortunately, when I put LinearSolve[pc, data] It says Linear equation encountered that has no solution.

POSTED BY: David Thissen

Jim Baldwin

Jim Baldwin, Retired

Posted 11 years ago

Something like this? data = {{13.2, 200, 58, 21.2}, {10, 263, 48, 44.5}, {8.1, 294, 80, 31}, {8.8, 190, 50, 19.5}, {9, 276, 91, 40.6}, {7.9, 204, 78, 38.7}, {3.3, 110, 77, 11.1}, {5.9, 238, 72, 15.8}, {15.4, 335, 80, 31.9}, {17.4, 211, 60, 25.8}}; pc = PrincipalComponents[data, Method -> Correlation] ListPlot[pc[[All, {1, 2}]]] pcMin = Min[pc]; pcMax = Max[pc]; ListPointPlot3D[pc[[All, {1, 2, 3}]], BoxRatios -> {1, 1, 1}, PlotRange -> {{pcMin, pcMax}, {pcMin, pcMax}, {pcMin, pcMax}}]

Something like this?

data = {{13.2, 200, 58, 21.2}, {10, 263, 48, 44.5}, {8.1, 294, 80, 
    31}, {8.8, 190, 50, 19.5}, {9, 276, 91, 40.6}, {7.9, 204, 78, 
    38.7}, {3.3, 110, 77, 11.1}, {5.9, 238, 72, 15.8}, {15.4, 335, 
    80, 31.9}, {17.4, 211, 60, 25.8}};
pc = PrincipalComponents[data, Method -> Correlation]
ListPlot[pc[[All, {1, 2}]]]
pcMin = Min[pc];
pcMax = Max[pc];
ListPointPlot3D[pc[[All, {1, 2, 3}]], BoxRatios -> {1, 1, 1}, 
 PlotRange -> {{pcMin, pcMax}, {pcMin, pcMax}, {pcMin, pcMax}}]

POSTED BY: Jim Baldwin

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback