0
|
10110 Views
|
10 Replies
|
9 Total Likes
View groups...
Share

# Relation between Principal Components

Posted 9 years ago
 Hi all, I'm relatively new to Mathematica and working with PCA (Principal Component Analysis). From biological experiments on the foraging of nematodes I've collected some data points (approx 200) of 100 dimensions each. The data is stored in a matrix, 100 rows and approx. 200 columns. Now, to find the major aspects of the foraging movement, I've done a PCA on the data: pca=PrincipalComponents[data]  Great! I've got the principal components, of which the first four are the most relevant, explaining the lion's share of the total variance. From theoretical considerations, I assume that PC1 and PC2 amplitudes are related as follows (idealized, of course): Now, how can I plot something like that? First, I would need a list of amplitudes a1, a2, a3, a4 for the first four Principal Components and for all 200 data points. Then I somehow need to plot the relation Plot[a2[a1], {a1,-.2,.2}]  Any ideas on how to get the amplitudes and how to plot them? . Thanks, any input would help a lot! David.
10 Replies
Sort By:
Posted 9 years ago
 Something like this? data = {{13.2, 200, 58, 21.2}, {10, 263, 48, 44.5}, {8.1, 294, 80, 31}, {8.8, 190, 50, 19.5}, {9, 276, 91, 40.6}, {7.9, 204, 78, 38.7}, {3.3, 110, 77, 11.1}, {5.9, 238, 72, 15.8}, {15.4, 335, 80, 31.9}, {17.4, 211, 60, 25.8}}; pc = PrincipalComponents[data, Method -> Correlation] ListPlot[pc[[All, {1, 2}]]] pcMin = Min[pc]; pcMax = Max[pc]; ListPointPlot3D[pc[[All, {1, 2, 3}]], BoxRatios -> {1, 1, 1}, PlotRange -> {{pcMin, pcMax}, {pcMin, pcMax}, {pcMin, pcMax}}] 
Posted 9 years ago
 Thanks, Jim!this will plot the first two principal component vectors, PC1 and PC2. What I need is the amplitudes.In other words: What do I need to multiply the principal component vectors with to get my original data points. data[[All,1]] = (a1)_1*PC1 + (a2)_1*PC2 + (a3)_1*PC3 + ... I need that for data[[All,1]], data[[All,2]], data[[All,3]], ... In the end, I want something like Plot[a2[a1], {a1,-.2,.2}] .Unfortunately, when I put LinearSolve[pc, data] It says Linear equation encountered that has no solution. 
Posted 9 years ago
 David,Did you have a look at KarhunenLoeveDecomposition? It also provides the transformation matrix between the data and the principal components.
Posted 9 years ago
 Thanks, Matthias, for your reply!Yes, I played around with KarhunenLoeveDecomposition as well a little bit. But basically, I don't see any difference between KarhunenLoeveDecomposition and PrincipalComponents.The two functions seem to be exactly equivalent, except that the data has to be transposed for KarhunenLoeve...KarhunenLoeveDecomposition also gives a matrix of "eigenvalues", but they are unfortunately not the amplitudes I'm looking for. They are the eigenvalues of the covariance matrix for the data...
Posted 9 years ago
 I don't know the terminology you are using. What do you mean by amplitude? Can you point to a formula?
Posted 9 years ago
 Here is how you can obtain the associated principal components score for a new vector of data: (* Data from Mathematica documentation on PrincipalComponents *) data = {{13.2, 200, 58, 21.2}, {10, 263, 48, 44.5}, {8.1, 294, 80, 31}, {8.8, 190, 50, 19.5}, {9, 276, 91, 40.6}, {7.9, 204, 78, 38.7}, {3.3, 110, 77, 11.1}, {5.9, 238, 72, 15.8}, {15.4, 335, 80, 31.9}, {17.4, 211, 60, 25.8}}; (* Get principal component scores from high level function *) pc = PrincipalComponents[data, Method -> Correlation]; (* Get information needed to transform additional vectors in the same \ way *) dataMean = Mean[data]; dataSD = StandardDeviation[data]; (* Standardize original data *) z = (# - dataMean)/dataSD & /@ data; (* Get principal components and other necessary info the hard way *) {u, s, v} = SingularValueDecomposition[z]; (* But note that pc and z.v are "essentially the same" except that \ the signs of some columns might be reversed, i.e., principal \ components are not unique with respect to sign *) MatrixForm[pc] MatrixForm[z.v] (* So from here on we use z.v rather than pc because we need the \ other pieces generated by SingularValueDecomposition *) (* Say we have a new vector of values that we want to transform to \ its corresponding principal component score *) (* I've used just the first row of the data to show that one gets the \ correct results - one never has enough QA *) x = {13.2, 200, 58, 21.2}; (* Standardize x with the data mean and standard deviation *) zx = (x - dataMean)/dataSD; (* Obtain principal components score for x *) pcx = zx.v So that will obtain the principal components for any new data. Like Matthias, I have not seen the term "amplitude" associated with principal component theory. Is that a particular subject matter jargon?
Posted 9 years ago
 And to go from a principal component score back to the world of the original data one could use pcx.Transpose[v]*dataSD + dataMean where pcx is a vector of principal component scores.
Posted 9 years ago
 Thanks, Jim!Exactly what I was looking for! What I meant by "amplitudes" is basically contained in vector v...Thank you all and best wishes from the wintery snowy south of Germany,David.
Posted 9 years ago
 David, If by amplitudes you were looking for the vector v above, then you are looking for the second argument returned by KarhunenLoeveTransform. Using the variable names from Jim's example, v and Last@KarhunenLoeveDecomposition[Transpose@z] are identical up to the sign.
Posted 8 years ago
 Can someone please explain this to me? Running PCA in SPSS spits out the component scores for each variable. How can I get something that looks similar?