Message Boards Message Boards

Relation between Principal Components

Posted 10 years ago

Hi all,

I'm relatively new to Mathematica and working with PCA (Principal Component Analysis).

From biological experiments on the foraging of nematodes I've collected some data points (approx 200) of 100 dimensions each. The data is stored in a matrix, 100 rows and approx. 200 columns.

Now, to find the major aspects of the foraging movement, I've done a PCA on the data:

pca=PrincipalComponents[data]

Great! I've got the principal components, of which the first four are the most relevant, explaining the lion's share of the total variance.

From theoretical considerations, I assume that PC1 and PC2 amplitudes are related as follows (idealized, of course): Relation of first and second Principal Components

Now, how can I plot something like that? First, I would need a list of amplitudes a1, a2, a3, a4 for the first four Principal Components and for all 200 data points. Then I somehow need to plot the relation

Plot[a2[a1], {a1,-.2,.2}]

Any ideas on how to get the amplitudes and how to plot them?

.

Thanks, any input would help a lot!

David.

POSTED BY: David Thissen
10 Replies
Posted 10 years ago

Something like this?

data = {{13.2, 200, 58, 21.2}, {10, 263, 48, 44.5}, {8.1, 294, 80, 
    31}, {8.8, 190, 50, 19.5}, {9, 276, 91, 40.6}, {7.9, 204, 78, 
    38.7}, {3.3, 110, 77, 11.1}, {5.9, 238, 72, 15.8}, {15.4, 335, 
    80, 31.9}, {17.4, 211, 60, 25.8}};
pc = PrincipalComponents[data, Method -> Correlation]
ListPlot[pc[[All, {1, 2}]]]
pcMin = Min[pc];
pcMax = Max[pc];
ListPointPlot3D[pc[[All, {1, 2, 3}]], BoxRatios -> {1, 1, 1}, 
 PlotRange -> {{pcMin, pcMax}, {pcMin, pcMax}, {pcMin, pcMax}}]
POSTED BY: Jim Baldwin

Thanks, Jim!

this will plot the first two principal component vectors, PC1 and PC2. What I need is the amplitudes.

In other words: What do I need to multiply the principal component vectors with to get my original data points.

data[[All,1]] = (a1)_1*PC1 + (a2)_1*PC2 + (a3)_1*PC3 + ...

I need that for

data[[All,1]], data[[All,2]], data[[All,3]], ...

In the end, I want something like

Plot[a2[a1], {a1,-.2,.2}]

.

Unfortunately, when I put

LinearSolve[pc, data]

It says

Linear equation encountered that has no solution.
POSTED BY: David Thissen

David,

Did you have a look at KarhunenLoeveDecomposition? It also provides the transformation matrix between the data and the principal components.

POSTED BY: Matthias Odisio

Thanks, Matthias, for your reply!

Yes, I played around with KarhunenLoeveDecomposition as well a little bit. But basically, I don't see any difference between KarhunenLoeveDecomposition and PrincipalComponents.

The two functions seem to be exactly equivalent, except that the data has to be transposed for KarhunenLoeve...

KarhunenLoeveDecomposition also gives a matrix of "eigenvalues", but they are unfortunately not the amplitudes I'm looking for. They are the eigenvalues of the covariance matrix for the data...

POSTED BY: David Thissen

I don't know the terminology you are using. What do you mean by amplitude? Can you point to a formula?

POSTED BY: Matthias Odisio
Posted 10 years ago

Here is how you can obtain the associated principal components score for a new vector of data:

(* Data from Mathematica documentation on PrincipalComponents *)
data = {{13.2, 200, 58, 21.2}, {10, 263, 48, 44.5}, {8.1, 294, 80, 
    31}, {8.8, 190, 50, 19.5}, {9, 276, 91, 40.6}, {7.9, 204, 78, 
    38.7}, {3.3, 110, 77, 11.1}, {5.9, 238, 72, 15.8}, {15.4, 335, 80,
     31.9}, {17.4, 211, 60, 25.8}};

(* Get principal component scores from high level function *)
pc = PrincipalComponents[data, Method -> Correlation];

(* Get information needed to transform additional vectors in the same \
way *)
dataMean = Mean[data];
dataSD = StandardDeviation[data];

(* Standardize original data *)
z = (# - dataMean)/dataSD & /@ data;

(* Get principal components and other necessary info the hard way *)
{u, s, v} = SingularValueDecomposition[z];
(* But note that pc and z.v are "essentially the same" except that \
the signs of some columns might be reversed, i.e., principal \
components are not unique with respect to sign *)
MatrixForm[pc]
MatrixForm[z.v]
(* So from here on we use z.v rather than pc because we need the \
other pieces generated by SingularValueDecomposition *)

(* Say we have a new vector of values that we want to transform to \
its corresponding principal component score *)
(* I've used just the first row of the data to show that one gets the \
correct results - one never has enough QA *)
x = {13.2, 200, 58, 21.2};

(* Standardize x with the data mean and standard deviation *)
zx = (x - dataMean)/dataSD;

(* Obtain principal components score for x *)
pcx = zx.v

So that will obtain the principal components for any new data. Like Matthias, I have not seen the term "amplitude" associated with principal component theory. Is that a particular subject matter jargon?

POSTED BY: Jim Baldwin
Posted 10 years ago

And to go from a principal component score back to the world of the original data one could use

pcx.Transpose[v]*dataSD + dataMean

where pcx is a vector of principal component scores.

POSTED BY: Jim Baldwin

Thanks, Jim!

Exactly what I was looking for! What I meant by "amplitudes" is basically contained in vector v...

Thank you all and best wishes from the wintery snowy south of Germany,

David.

POSTED BY: David Thissen

David, If by amplitudes you were looking for the vector v above, then you are looking for the second argument returned by KarhunenLoeveTransform. Using the variable names from Jim's example, v and Last@KarhunenLoeveDecomposition[Transpose@z] are identical up to the sign.

POSTED BY: Matthias Odisio

Can someone please explain this to me?

Running PCA in SPSS spits out the component scores for each variable. How can I get something that looks similar?

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract