Group Abstract Group Abstract

Message Boards Message Boards

1
|
9.2K Views
|
21 Replies
|
19 Total Likes
View groups...
Share
Share this post:

How can I get transform coefficients for Principal Component Analysis?

Posted 2 years ago

Mathematica provides the function PrincipalComponentAnalysis which transforms data to be expressed in terms of its principal components. How do I get the coefficients mapping the values of the original variables to their values expressed via the principal components.
I have seen this question asked elsewhere and not received any very satisfactory answer. I feel there ought to be a simple command or function that will do it, whereas the proposed solutions involve cumbersome manual manipulations.
Is it really the case that Mathematica cannot do this directly and, if so, what is the simplest workaround?
To clarify what I am looking for, suppose my original data is expressed in terms of two variables x and y, and I have two data points, (x1, y1) and (x2, y2). PrincipalComponentsAnalysis re-expresses the data in terms of variables u and v, which are linear combinations of x and y, thus returning points (u1,v1) and (u2,v2) such that

u1=a x1 + b y1

v1 = c x1 + d y1

u2 = a x2 + b y2

v2= c x2 +d y2

How do I find the a, b, c and d?

POSTED BY: Marc Widdowson
21 Replies

Hi Marc, some time ago, I wrote a package for PCA "à la française". Perhaps it would be useful to you?

Attachments:
POSTED BY: Claude Mante
Posted 2 years ago

Yes, that sounds like it would be useful to me. Thank you.

Having now had the chance to try the suggestion of @Sangdon Lee, I have found that SingularValueDecomposition works well subject to the following caveats:

  1. To get the best result from SVD, it is necessary first to subtract the mean of each variable from its respective values, i.e. to translate the values so that they have zero mean. Otherwise, SVD does not return the same result as PCA and is not as successful in concentrating the variance into a few components.

  2. With SVD, the signs of the transformed variables are in general different from those returned by PCA though the absolute values are the same. The sign is essentially arbitrary and this does not affect any conclusions from the analysis.

It seems that PrincipalComponents is a convenience method that automatically adjusts the variables to have zero mean before applying SVD. I now imagine that the reason there is no option to get a vector with the loadings from PrincipalComponents is because the transformed variables are not just a linear combination of the original variables but are related to them by a shift and then a linear combination. It would nevertheless be good if the PrincipalComponents documentation could include some discussion of this and of how to obtain the relationship between the original variables and the transformed variables if that is what one is interested in.

POSTED BY: Marc Widdowson

As best i can tell, PrincipalCoordinates is doing the same computation as the resource function "MultidimensionalScaling" up to column signs, provided that latter is instructed to return a result with vectors of the same dimension as the input. Here is an example.

mat = RandomReal[{-10, 10}, {10, 4}];
pcoords = PrincipalComponents[mat];
Dimensions[pcoords]

(* Out[21]= {10, 4} *)

mds = ResourceFunction["MultidimensionalScaling"][mat, 4];
pcoords/mds

(* Out[25]= {{-1., -1., -1., 1.}, {-1., -1., -1., 1.}, {-1., -1., -1., 
  1.}, {-1., -1., -1., 1.}, {-1., -1., -1., 1.}, {-1., -1., -1., 
  1.}, {-1., -1., -1., 1.}, {-1., -1., -1., 1.}, {-1., -1., -1., 
  1.}, {-1., -1., -1., 1.}} *)

I am not familiar with the details of the PrincipalCoordinates implementation. But the code for "MultidimensionalScaling" can be accessed via ResourceObject["MultidimensionalScaling"]["DefinitionNotebook"]. Depending on what exactly you require, you might be able to alter that to return more information about the transformation on the input vectors.

POSTED BY: Daniel Lichtblau

Hi Daniel!

There is a lot of variants of MDS, and the one implemented here is Principal Coordinates Analysis. So, It is exactly identical to PCA (up to the sign of eigenvectors), but since you cannot have an access to the loadings of the variable, results are less understandable than those from PCA. So, PCA is best, excepted if one use a special proximity measure (the Rao distance between distributions, the ManhattanDistance between vectors, etc). In such cases, the original table of data is no more considered, but there is NO guarantee to fit an Euclidean structure to the data. For instance, the Torgerson matrix may have large negative eigenvalues, which shows that data don't have an euclidean image - as an example, the Rao's distance is Riemannian.

POSTED BY: Claude Mante
Posted 2 years ago

@Sangdon Lee

Thank you very much.

POSTED BY: Marc Widdowson
Posted 2 years ago

PCA is identical with SVD (Singular Value Decomposition). Therefore,

X=U*S*V'=T*V'= PC Scores * PC Loadings'.  (' is for Transpose).

The PrincipalComponentAnalysis function displays the PC Scores (T=U.S) only and does not provide the "V". Thus use the SingularValueDecomposition function. The "V" is the loadings you want to find.

For example,

  X = N[{{1, 2, 3}, {4, 5, 6}, {7, 8, 9}, {10, 11, 12}}]

  {U, S, V} = SingularValueDecomposition[X],

  U.S.Transpose[V] reproduces the original matrix (X).  

By the way, I wish that Wolfram would develop functions for PARAFAC.

I found Mathematica functions for Factor Analysis with Varimax rotation and PLS (partial least square) (don't remember the location. search in the Mathematica webpage). Anton Antonov developed ICA (independent component analysis): https://resources.wolframcloud.com/FunctionRepository/resources/IndependentComponentAnalysis/

POSTED BY: Sangdon Lee
POSTED BY: Daniel Lichtblau
Posted 2 years ago
POSTED BY: Sangdon Lee
POSTED BY: Daniel Lichtblau
Posted 2 years ago

Hi Daniel,

I posted another question on PCA after reading your comments carefully because I consider PCA as one of the most important methods.

https://community.wolfram.com/groups/-/m/t/2949003?p_p_auth=6pSCpXdg

POSTED BY: Sangdon Lee

Sangdon,

I upvoted your new post and also upvoted this one (which I should have done days ago). I agree the naming conventions, lack of full documentation and lack of means to work with DimensionReduction results all are weaknesses.

POSTED BY: Daniel Lichtblau
Posted 2 years ago

Yes, mea culpa. In the original post, I referred to a PrincipalComponentsAnalysis function when I should have said PrincipalComponents function. Sorry for the confusion. It does seem that this is kind of an omission in WL. When you're interested in doing a PCA with Mathematica, you naturally reach for the PrincipalComponents function and yet it won't give you the loadings, which is, very often, what is of most interest, i.e. to identify which of your original variables is responsible for most of the variation in the data. While you can, it seems, extract that information with other methods as described by @Sangdon Lee , you don't really want to be reading through MSE or Wolfram Community threads to try and understand exactly how PCA works so that you can apply other more general purpose functions to your problem.

POSTED BY: Marc Widdowson

Marc, Do you have a simple example of input and desired output? It's still not clear to me what you want, and whether it pertains to the usual definition of Principal Components Analysis, or to the actual implementation of PrincipalComponents.

POSTED BY: Daniel Lichtblau
Posted 2 years ago
POSTED BY: Marc Widdowson

Marc,

I think Sangdon Lee may have provided a bit more than what was originally requested. Per my prior notes, it turns out that ``PrincipalComponents``` is really implementing something else under the hood. So his recipe not only gives the other values of interest but also shows how to do the PCA you actually want.

From your description I think you want the singular values and perhaps also the conversion matrix one would apply to new data to attain the same projection.

POSTED BY: Daniel Lichtblau
Posted 2 years ago

You're right. Thank you very much.

POSTED BY: Marc Widdowson

Thanks for referring to my implementation of Independent Component Analysis (ICA)!

I recently "pacletized" the ICA and Non-Negative Matrix Factorization (NNMF) into the paclet "DimensionReducers".

(So, I can have a more modular implementation of my LSA monad.)

POSTED BY: Anton Antonov

Thanks a lot for sharing! I frequently use your nice MonadicQuantileRegression workflow.

POSTED BY: Claude Mante

Good to hear! It inspires me "finish" the documentation of the LSA monad...

POSTED BY: Anton Antonov
Posted 2 years ago
POSTED BY: Marc Widdowson

There is this MSE post that might be of use.

POSTED BY: Daniel Lichtblau
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard