Group Abstract Group Abstract

Message Boards Message Boards

1
|
11K Views
|
21 Replies
|
19 Total Likes
View groups...
Share
Share this post:

How can I get transform coefficients for Principal Component Analysis?

Posted 2 years ago

Mathematica provides the function PrincipalComponentAnalysis which transforms data to be expressed in terms of its principal components. How do I get the coefficients mapping the values of the original variables to their values expressed via the principal components.
I have seen this question asked elsewhere and not received any very satisfactory answer. I feel there ought to be a simple command or function that will do it, whereas the proposed solutions involve cumbersome manual manipulations.
Is it really the case that Mathematica cannot do this directly and, if so, what is the simplest workaround?
To clarify what I am looking for, suppose my original data is expressed in terms of two variables x and y, and I have two data points, (x1, y1) and (x2, y2). PrincipalComponentsAnalysis re-expresses the data in terms of variables u and v, which are linear combinations of x and y, thus returning points (u1,v1) and (u2,v2) such that

u1=a x1 + b y1

v1 = c x1 + d y1

u2 = a x2 + b y2

v2= c x2 +d y2

How do I find the a, b, c and d?

POSTED BY: Marc Widdowson
21 Replies

Good to hear! It inspires me "finish" the documentation of the LSA monad...

POSTED BY: Anton Antonov

Hi Daniel!

There is a lot of variants of MDS, and the one implemented here is Principal Coordinates Analysis. So, It is exactly identical to PCA (up to the sign of eigenvectors), but since you cannot have an access to the loadings of the variable, results are less understandable than those from PCA. So, PCA is best, excepted if one use a special proximity measure (the Rao distance between distributions, the ManhattanDistance between vectors, etc). In such cases, the original table of data is no more considered, but there is NO guarantee to fit an Euclidean structure to the data. For instance, the Torgerson matrix may have large negative eigenvalues, which shows that data don't have an euclidean image - as an example, the Rao's distance is Riemannian.

POSTED BY: Claude Mante

Thanks a lot for sharing! I frequently use your nice MonadicQuantileRegression workflow.

POSTED BY: Claude Mante

Thanks for referring to my implementation of Independent Component Analysis (ICA)!

I recently "pacletized" the ICA and Non-Negative Matrix Factorization (NNMF) into the paclet "DimensionReducers".

(So, I can have a more modular implementation of my LSA monad.)

POSTED BY: Anton Antonov

As best i can tell, PrincipalCoordinates is doing the same computation as the resource function "MultidimensionalScaling" up to column signs, provided that latter is instructed to return a result with vectors of the same dimension as the input. Here is an example.

mat = RandomReal[{-10, 10}, {10, 4}];
pcoords = PrincipalComponents[mat];
Dimensions[pcoords]

(* Out[21]= {10, 4} *)

mds = ResourceFunction["MultidimensionalScaling"][mat, 4];
pcoords/mds

(* Out[25]= {{-1., -1., -1., 1.}, {-1., -1., -1., 1.}, {-1., -1., -1., 
  1.}, {-1., -1., -1., 1.}, {-1., -1., -1., 1.}, {-1., -1., -1., 
  1.}, {-1., -1., -1., 1.}, {-1., -1., -1., 1.}, {-1., -1., -1., 
  1.}, {-1., -1., -1., 1.}} *)

I am not familiar with the details of the PrincipalCoordinates implementation. But the code for "MultidimensionalScaling" can be accessed via ResourceObject["MultidimensionalScaling"]["DefinitionNotebook"]. Depending on what exactly you require, you might be able to alter that to return more information about the transformation on the input vectors.

POSTED BY: Daniel Lichtblau
Posted 2 years ago

Yes, that sounds like it would be useful to me. Thank you.

Having now had the chance to try the suggestion of @Sangdon Lee, I have found that SingularValueDecomposition works well subject to the following caveats:

  1. To get the best result from SVD, it is necessary first to subtract the mean of each variable from its respective values, i.e. to translate the values so that they have zero mean. Otherwise, SVD does not return the same result as PCA and is not as successful in concentrating the variance into a few components.

  2. With SVD, the signs of the transformed variables are in general different from those returned by PCA though the absolute values are the same. The sign is essentially arbitrary and this does not affect any conclusions from the analysis.

It seems that PrincipalComponents is a convenience method that automatically adjusts the variables to have zero mean before applying SVD. I now imagine that the reason there is no option to get a vector with the loadings from PrincipalComponents is because the transformed variables are not just a linear combination of the original variables but are related to them by a shift and then a linear combination. It would nevertheless be good if the PrincipalComponents documentation could include some discussion of this and of how to obtain the relationship between the original variables and the transformed variables if that is what one is interested in.

POSTED BY: Marc Widdowson

Hi Marc, some time ago, I wrote a package for PCA "à la française". Perhaps it would be useful to you?

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard
Be respectful. Review our Community Guidelines to understand your role and responsibilities. Community Terms of Use