Group Abstract Group Abstract

Message Boards Message Boards

1
|
11K Views
|
21 Replies
|
19 Total Likes
View groups...
Share
Share this post:

How can I get transform coefficients for Principal Component Analysis?

Posted 2 years ago

Mathematica provides the function PrincipalComponentAnalysis which transforms data to be expressed in terms of its principal components. How do I get the coefficients mapping the values of the original variables to their values expressed via the principal components.
I have seen this question asked elsewhere and not received any very satisfactory answer. I feel there ought to be a simple command or function that will do it, whereas the proposed solutions involve cumbersome manual manipulations.
Is it really the case that Mathematica cannot do this directly and, if so, what is the simplest workaround?
To clarify what I am looking for, suppose my original data is expressed in terms of two variables x and y, and I have two data points, (x1, y1) and (x2, y2). PrincipalComponentsAnalysis re-expresses the data in terms of variables u and v, which are linear combinations of x and y, thus returning points (u1,v1) and (u2,v2) such that

u1=a x1 + b y1

v1 = c x1 + d y1

u2 = a x2 + b y2

v2= c x2 +d y2

How do I find the a, b, c and d?

POSTED BY: Marc Widdowson
21 Replies
Posted 2 years ago

Hi Daniel,

I believe you and I are discussing different aspects. Your comment pertains to the PC scores (T=US) (e.g., uu.ww in your comments), while my comment is about the PC loadings (V). The initial question was regarding the PrincipalComponentAnalysis function, which only displays the T and does not provide the V. The V shows how the original variables are transformed (as a linear combination of the original variables) and maybe interpreted afterward. The T values represent the outcomes of the transformation (the reduced vector): T=XV, where X=TV' and XV=TV'V=T, as V'V=I.

X=USV’=Left singular vector*Singular values*right singular vector’ = TV’ = PC scores*loadings.  

SVD is related to EVD as follows to clarify terminologies between SVD and EVD:

A=X’X=(USV’)’(USV’)=VS^2V’=eigenvectors*eigenvalues*eigenvectors’, because U’U=I and V’V=I. 

By the way, many dimension reduction methods apply SVD after deriving various “similarity” matrices. X’X represents a covariance or correlation matrix in PCA, depending on normalization or standardization applied to column variables. Experiences regarding the applications of PCA, MDS, and CA have reported producing similar results.

  • PCA: X --> covariance or correlation matrix (X’X) --> apply SVD on the covariance matrix.
  • Multidimensional Scaling (MDS) : X--> distance matrix (various distance matrices) --> apply SVD on the distance matrix
  • Correspondence analysis (CA): X-->profile matrix -->apply SVD on the profile matrix

By the ways, thanks for the detail information about the DimensionReduction function. I am not well familiar with this function and your comments are useful to me.

POSTED BY: Sangdon Lee
Posted 2 years ago

PCA is identical with SVD (Singular Value Decomposition). Therefore,

X=U*S*V'=T*V'= PC Scores * PC Loadings'.  (' is for Transpose).

The PrincipalComponentAnalysis function displays the PC Scores (T=U.S) only and does not provide the "V". Thus use the SingularValueDecomposition function. The "V" is the loadings you want to find.

For example,

  X = N[{{1, 2, 3}, {4, 5, 6}, {7, 8, 9}, {10, 11, 12}}]

  {U, S, V} = SingularValueDecomposition[X],

  U.S.Transpose[V] reproduces the original matrix (X).  

By the way, I wish that Wolfram would develop functions for PARAFAC.

I found Mathematica functions for Factor Analysis with Varimax rotation and PLS (partial least square) (don't remember the location. search in the Mathematica webpage). Anton Antonov developed ICA (independent component analysis): https://resources.wolframcloud.com/FunctionRepository/resources/IndependentComponentAnalysis/

POSTED BY: Sangdon Lee

I assume the original question involved

DimensionReduction[...,Method->"PrincipalComponentsAnalysis"]

and not the function PrincipalComponents. In which case this isn't correct, because, for whatever reason, the dimension reduction method does not correspond to the usual definition of PCA . The SVD usage, as shown, is however appropriate to the setting Method->"LatentSemanticAnalysis" of DimensionReduction and the like. Here is a quick example to illustrate.

mat = {{1., 2, 5.}, {3., -1., 2.}, {5, -1, 2}};
dimredPCA = DimensionReduction[mat, 2, Method -> "PrincipalComponentsAnalysis"];
dimredPCA[mat, "ReducedVectors"]

(* Out[107]= {{-2.34313, 0.0987804}, {0.83005, -0.55769}, {1.51308,  0.458909}} *)

We will not recover these using PCA.

{uu, ww, vv} = SingularValueDecomposition[mat, 2];
uu . ww

(* Out[110]= {{-4.16306, -3.55907}, {-3.55612, 1.06304}, {-5.00475,  2.20517}} *)

We do recover them with LSA though.

dimredLSA = DimensionReduction[mat, 2, Method -> "LatentSemanticAnalysis"];
dimredLSA[mat, "ReducedVectors"]

(* Out[112]= {{-4.16306, 3.55907}, {-3.55612, -1.06304}, {-5.00475, -2.20517}} *)

Let's get back to the other possibility, the function PrincipalComponents.

PrincipalComponents[mat]

(* Out[481]= {{3.45902, 0.187592, 0.}, {-1.10879, -0.877829, 0.}, {-2.35023, 0.690237, 0.}} *)

It turns out this is essentially the same as what comes from the resource function MultidimensionalScaling, bearing in mind that resulting columns are only unique up to sign.

ResourceFunction["MultidimensionalScaling"][mat, 3]

(* Out[482]= {{-3.45902, 0.187592, 0.}, {1.10879, -0.877829, 0.}, {2.35023, 0.690237, 0.}} *)

Who knew? Certainly not the author of RF[MDS]. As a further note, this is essentially different from using Method->"MultidimensionalScaling" in DimensionReduction. Another note is that both PrincipalComponents and ResourceFunction["MultidimensionalScaling"] use what is usually termed "Principal Coordinate Analysis" (read second word carefully). This is as seen in the Wikipedia article for MDS at

https://en.wikipedia.org/wiki/Multidimensional_scaling

but I will remark that I've seen the same distinction in other places as well.

POSTED BY: Daniel Lichtblau

As best i can tell, PrincipalCoordinates is doing the same computation as the resource function "MultidimensionalScaling" up to column signs, provided that latter is instructed to return a result with vectors of the same dimension as the input. Here is an example.

mat = RandomReal[{-10, 10}, {10, 4}];
pcoords = PrincipalComponents[mat];
Dimensions[pcoords]

(* Out[21]= {10, 4} *)

mds = ResourceFunction["MultidimensionalScaling"][mat, 4];
pcoords/mds

(* Out[25]= {{-1., -1., -1., 1.}, {-1., -1., -1., 1.}, {-1., -1., -1., 
  1.}, {-1., -1., -1., 1.}, {-1., -1., -1., 1.}, {-1., -1., -1., 
  1.}, {-1., -1., -1., 1.}, {-1., -1., -1., 1.}, {-1., -1., -1., 
  1.}, {-1., -1., -1., 1.}} *)

I am not familiar with the details of the PrincipalCoordinates implementation. But the code for "MultidimensionalScaling" can be accessed via ResourceObject["MultidimensionalScaling"]["DefinitionNotebook"]. Depending on what exactly you require, you might be able to alter that to return more information about the transformation on the input vectors.

POSTED BY: Daniel Lichtblau

All this looks about right, except there is no WL PrincipalComponentAnalysis (PCA) function per se. And PrincipalComponents appears to implement what's called Principle Coordinate Analysis (which is not quite the same thing), and this is at the heart of the vanilla form of multidimensional scaling (MDS). What you indicate as PCA agrees with the Wikipedia definition of Principal Components Analysis. In WL it is given by the "LatentSemanticAnalysis" (LSA) method of DimensionReduction. It is unclear what is being done by the "PrincipalComponentAnalysis" method. About the only clue from the documentation is that it is equivalent to LSA after standardizing the input.

I've obtained good results both with MDS and LSA/PCA. Agreed, they do have similarities in terms of quality.

POSTED BY: Daniel Lichtblau

Hi Marc, some time ago, I wrote a package for PCA "à la française". Perhaps it would be useful to you?

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard
Be respectful. Review our Community Guidelines to understand your role and responsibilities. Community Terms of Use