Message Boards

WOLFRAM COMMUNITY

3660 Views

21 Replies

19 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Data Science Wolfram Language Statistics and Probability

How can I get transform coefficients for Principal Component Analysis?

Marc Widdowson

Posted 10 months ago

Mathematica provides the function `PrincipalComponentAnalysis` which transforms data to be expressed in terms of its principal components. How do I get the coefficients mapping the values of the original variables to their values expressed via the principal components. I have seen this question asked elsewhere and not received any very satisfactory answer. I feel there ought to be a simple command or function that will do it, whereas the proposed solutions involve cumbersome manual manipulations. Is it really the case that Mathematica cannot do this directly and, if so, what is the simplest workaround? To clarify what I am looking for, suppose my original data is expressed in terms of two variables `x` and `y`, and I have two data points, `(x1, y1)` and `(x2, y2)`. PrincipalComponentsAnalysis re-expresses the data in terms of variables `u` and `v`, which are linear combinations of `x` and `y`, thus returning points `(u1,v1)` and `(u2,v2)` such that `u1`=`a x1 + b y1` `v1` = `c x1 + d y1` `u2` = `a x2 + b y2` `v2`= `c x2 +d y2` How do I find the `a`, `b`, `c` and `d`?

POSTED BY: Marc Widdowson

21 Replies

Sort By:

Claude Mante

Claude Mante, Retired

Posted 10 months ago

Hi Marc, some time ago, I wrote a package for PCA "à la française". Perhaps it would be useful to you? Attachments: MyPCA.wl

POSTED BY: Claude Mante

Marc Widdowson

Posted 10 months ago

Yes, that sounds like it would be useful to me. Thank you. Having now had the chance to try the suggestion of @Sangdon Lee, I have found that `SingularValueDecomposition` works well subject to the following caveats: To get the best result from SVD, it is necessary first to subtract the mean of each variable from its respective values, i.e. to translate the values so that they have zero mean. Otherwise, SVD does not return the same result as PCA and is not as successful in concentrating the variance into a few components. With SVD, the signs of the transformed variables are in general different from those returned by PCA though the absolute values are the same. The sign is essentially arbitrary and this does not affect any conclusions from the analysis. It seems that `PrincipalComponents` is a convenience method that automatically adjusts the variables to have zero mean before applying SVD. I now imagine that the reason there is no option to get a vector with the loadings from `PrincipalComponents` is because the transformed variables are not just a linear combination of the original variables but are related to them by a shift and then a linear combination. It would nevertheless be good if the `PrincipalComponents` documentation could include some discussion of this and of how to obtain the relationship between the original variables and the transformed variables if that is what one is interested in.

POSTED BY: Marc Widdowson

Daniel Lichtblau

Daniel Lichtblau, Wolfram Research

Posted 10 months ago

As best i can tell, `PrincipalCoordinates` is doing the same computation as the resource function `"MultidimensionalScaling"` up to column signs, provided that latter is instructed to return a result with vectors of the same dimension as the input. Here is an example. mat = RandomReal[{-10, 10}, {10, 4}]; pcoords = PrincipalComponents[mat]; Dimensions[pcoords] (* Out[21]= {10, 4} ) mds = ResourceFunction["MultidimensionalScaling"][mat, 4]; pcoords/mds ( Out[25]= {{-1., -1., -1., 1.}, {-1., -1., -1., 1.}, {-1., -1., -1., 1.}, {-1., -1., -1., 1.}, {-1., -1., -1., 1.}, {-1., -1., -1., 1.}, {-1., -1., -1., 1.}, {-1., -1., -1., 1.}, {-1., -1., -1., 1.}, {-1., -1., -1., 1.}} *) I am not familiar with the details of the `PrincipalCoordinates` implementation. But the code for `"MultidimensionalScaling"` can be accessed via `ResourceObject["MultidimensionalScaling"]["DefinitionNotebook"]`. Depending on what exactly you require, you might be able to alter that to return more information about the transformation on the input vectors.

POSTED BY: Daniel Lichtblau

Claude Mante

Claude Mante, Retired

Posted 10 months ago

Hi Daniel! There is a lot of variants of MDS, and the one implemented here is Principal Coordinates Analysis. So, It is exactly identical to PCA (up to the sign of eigenvectors), but since you cannot have an access to the loadings of the variable, results are less understandable than those from PCA. So, PCA is best, excepted if one use a special proximity measure (the Rao distance between distributions, the ManhattanDistance between vectors, etc). In such cases, the original table of data is no more considered, but there is NO guarantee to fit an Euclidean structure to the data. For instance, the Torgerson matrix may have large negative eigenvalues, which shows that data don't have an euclidean image - as an example, the Rao's distance is Riemannian.

POSTED BY: Claude Mante

Marc Widdowson

Posted 10 months ago

@Sangdon Lee Thank you very much.

POSTED BY: Marc Widdowson

Sangdon Lee

Posted 10 months ago

PCA is identical with SVD (Singular Value Decomposition). Therefore, X=USV'=TV'= PC Scores PC Loadings'. (' is for Transpose). The PrincipalComponentAnalysis function displays the PC Scores (T=U.S) only and does not provide the "V". Thus use the SingularValueDecomposition function. The "V" is the loadings you want to find. For example, X = N[{{1, 2, 3}, {4, 5, 6}, {7, 8, 9}, {10, 11, 12}}] {U, S, V} = SingularValueDecomposition[X], U.S.Transpose[V] reproduces the original matrix (X). By the way, I wish that Wolfram would develop functions for PARAFAC. I found Mathematica functions for Factor Analysis with Varimax rotation and PLS (partial least square) (don't remember the location. search in the Mathematica webpage). Anton Antonov developed ICA (independent component analysis): https://resources.wolframcloud.com/FunctionRepository/resources/IndependentComponentAnalysis/

POSTED BY: Sangdon Lee

Daniel Lichtblau

Daniel Lichtblau, Wolfram Research

Posted 10 months ago

I assume the original question involved DimensionReduction[...,Method->"PrincipalComponentsAnalysis"] and not the function `PrincipalComponents`. In which case this isn't correct, because, for whatever reason, the dimension reduction method does not correspond to the usual definition of PCA . The SVD usage, as shown, is however appropriate to the setting `Method->"LatentSemanticAnalysis"` of `DimensionReduction` and the like. Here is a quick example to illustrate. mat = {{1., 2, 5.}, {3., -1., 2.}, {5, -1, 2}}; dimredPCA = DimensionReduction[mat, 2, Method -> "PrincipalComponentsAnalysis"]; dimredPCA[mat, "ReducedVectors"] (* Out[107]= {{-2.34313, 0.0987804}, {0.83005, -0.55769}, {1.51308, 0.458909}} ) We will not recover these using PCA. {uu, ww, vv} = SingularValueDecomposition[mat, 2]; uu . ww ( Out[110]= {{-4.16306, -3.55907}, {-3.55612, 1.06304}, {-5.00475, 2.20517}} ) We do recover them with LSA though. dimredLSA = DimensionReduction[mat, 2, Method -> "LatentSemanticAnalysis"]; dimredLSA[mat, "ReducedVectors"] ( Out[112]= {{-4.16306, 3.55907}, {-3.55612, -1.06304}, {-5.00475, -2.20517}} ) Let's get back to the other possibility, the function `PrincipalComponents`. PrincipalComponents[mat] ( Out[481]= {{3.45902, 0.187592, 0.}, {-1.10879, -0.877829, 0.}, {-2.35023, 0.690237, 0.}} ) It turns out this is essentially the same as what comes from the resource function MultidimensionalScaling, bearing in mind that resulting columns are only unique up to sign. ResourceFunction["MultidimensionalScaling"][mat, 3] ( Out[482]= {{-3.45902, 0.187592, 0.}, {1.10879, -0.877829, 0.}, {2.35023, 0.690237, 0.}} *) Who knew? Certainly not the author of RF[MDS]. As a further note, this is essentially different from using `Method->"MultidimensionalScaling"` in `DimensionReduction`. Another note is that both `PrincipalComponents` and `ResourceFunction["MultidimensionalScaling"]` use what is usually termed "Principal Coordinate Analysis" (read second word carefully). This is as seen in the Wikipedia article for MDS at https://en.wikipedia.org/wiki/Multidimensional_scaling but I will remark that I've seen the same distinction in other places as well.

POSTED BY: Daniel Lichtblau

Sangdon Lee

Posted 10 months ago

Hi Daniel, I believe you and I are discussing different aspects. Your comment pertains to the PC scores (T=US) (e.g., uu.ww in your comments), while my comment is about the PC loadings (V). The initial question was regarding the PrincipalComponentAnalysis function, which only displays the T and does not provide the V. The V shows how the original variables are transformed (as a linear combination of the original variables) and maybe interpreted afterward. The T values represent the outcomes of the transformation (the reduced vector): T=XV, where X=TV' and XV=TV'V=T, as V'V=I. X=USV’=Left singular vectorSingular valuesright singular vector’ = TV’ = PC scoresloadings. SVD is related to EVD as follows to clarify terminologies between SVD and EVD: A=X’X=(USV’)’(USV’)=VS^2V’=eigenvectorseigenvalues*eigenvectors’, because U’U=I and V’V=I. By the way, many dimension reduction methods apply SVD after deriving various “similarity” matrices. X’X represents a covariance or correlation matrix in PCA, depending on normalization or standardization applied to column variables. Experiences regarding the applications of PCA, MDS, and CA have reported producing similar results. PCA: X --> covariance or correlation matrix (X’X) --> apply SVD on the covariance matrix. Multidimensional Scaling (MDS) : X--> distance matrix (various distance matrices) --> apply SVD on the distance matrix Correspondence analysis (CA): X-->profile matrix -->apply SVD on the profile matrix By the ways, thanks for the detail information about the DimensionReduction function. I am not well familiar with this function and your comments are useful to me.

POSTED BY: Sangdon Lee

Daniel Lichtblau

Daniel Lichtblau, Wolfram Research

Posted 10 months ago

All this looks about right, except there is no WL `PrincipalComponentAnalysis` (PCA) function per se. And `PrincipalComponents` appears to implement what's called Principle Coordinate Analysis (which is not quite the same thing), and this is at the heart of the vanilla form of multidimensional scaling (MDS). What you indicate as PCA agrees with the Wikipedia definition of Principal Components Analysis. In WL it is given by the `"LatentSemanticAnalysis"` (LSA) method of `DimensionReduction`. It is unclear what is being done by the `"PrincipalComponentAnalysis"` method. About the only clue from the documentation is that it is equivalent to LSA after standardizing the input. I've obtained good results both with MDS and LSA/PCA. Agreed, they do have similarities in terms of quality.

POSTED BY: Daniel Lichtblau

Sangdon Lee

Posted 10 months ago

Hi Daniel, I posted another question on PCA after reading your comments carefully because I consider PCA as one of the most important methods. https://community.wolfram.com/groups/-/m/t/2949003?p_p_auth=6pSCpXdg

POSTED BY: Sangdon Lee

Daniel Lichtblau

Daniel Lichtblau, Wolfram Research

Posted 10 months ago

Sangdon, I upvoted your new post and also upvoted this one (which I should have done days ago). I agree the naming conventions, lack of full documentation and lack of means to work with `DimensionReduction` results all are weaknesses.

POSTED BY: Daniel Lichtblau

Marc Widdowson

Posted 10 months ago

Yes, mea culpa. In the original post, I referred to a `PrincipalComponentsAnalysis` function when I should have said `PrincipalComponents` function. Sorry for the confusion. It does seem that this is kind of an omission in WL. When you're interested in doing a PCA with Mathematica, you naturally reach for the `PrincipalComponents` function and yet it won't give you the loadings, which is, very often, what is of most interest, i.e. to identify which of your original variables is responsible for most of the variation in the data. While you can, it seems, extract that information with other methods as described by @Sangdon Lee , you don't really want to be reading through MSE or Wolfram Community threads to try and understand exactly how PCA works so that you can apply other more general purpose functions to your problem.

POSTED BY: Marc Widdowson

Daniel Lichtblau

Daniel Lichtblau, Wolfram Research

Posted 10 months ago

Marc, Do you have a simple example of input and desired output? It's still not clear to me what you want, and whether it pertains to the usual definition of Principal Components Analysis, or to the actual implementation of PrincipalComponents.

POSTED BY: Daniel Lichtblau

Marc Widdowson

Posted 10 months ago

Again, I apologise for causing confusion by my inaccuracy in using the wrong name for the function `PrincipalComponents`. Here is the situation. The data characterises a range of societies in terms of 15 sociological attributes, each of which can have the value either +1, 0, or -1. So each society is a 15-dimensional point. It is suspected that some of these attributes or dimensions are correlated and this is the kind of thing that PCA ought to be able to pick up. There are thousands of datapoints, and just for illustration the first five are as follows: {{0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., -1., 0., 0., 0.}, {0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., -1., 0., 0., 0.}, {0., 0., 0., -1., 0., 0., 0., 0., 0., 0., 0., -1., 0., 0., 0.}, {0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., -1., 0., 0., 0.}, {0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., -1., 0., 0., 0.}} When I supply the full dataset to `PrincipalComponents`, I get back a list of new 15-dimensional points in the transformed dimensions, which appear in order of the amount of variance they contain. Here are the first five points of the transformed data: {{0.628139, 0.080849, 0.141136, 0.215542, 0.123124, 0.475692, 0.250839, 0.0970342, 0.369091, 0.187105, 0.232599, 0.16679, 0.123272, 0.0682626, 0.096034}, {0.628139, 0.080849, 0.141136, 0.215542, 0.123124, 0.475692, 0.250839, 0.0970342, 0.369091, 0.187105, 0.232599, 0.16679, 0.123272, 0.0682626, 0.096034}, {0.326903, -0.0281672, 0.16276, 0.355153, 0.102935, 0.592857, 0.242404, -0.149108, 0.177386, 0.461905, -0.269042, -0.0351931, 0.341595, -0.0184339, 0.681469}, {0.628139, 0.080849, 0.141136, 0.215542, 0.123124, 0.475692, 0.250839, 0.0970342, 0.369091, 0.187105, 0.232599, 0.16679, 0.123272, 0.0682626, 0.096034}, {0.628139, 0.080849, 0.141136, 0.215542, 0.123124, 0.475692, 0.250839, 0.0970342, 0.369091, 0.187105, 0.232599, 0.16679, 0.123272, 0.0682626, 0.096034}} I can calculate the variance along each dimension just by applying `Variance` to the list returned by `PrincipalComponents`, and I find that the first dimension (first principal component) has about 65% of the total variance. Now I know that that each dimension of the transformed data is a linear combination of the original variables / dimensions / attributes. So if we write pc1 for the value on the first principal component (0.628139 in the case of the first datapoint above) and v1...v15 for my original variable values ({0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., -1., 0., 0., 0.} in the case of the first datapoint above), I know that pc1 = a1 v1 + a2 v2 + a3 v3 + ... + a15 v15 where a1...a15 are coefficients describing the transformation. It is these coefficients that I want to find. This is because, if the coefficient is large for a particular variable, it means that that variable contributes strongly to the first principal component, and that is of interest because it means that that variable is particularly good at capturing the distinctive character of each society. Of course, there are also coefficients mapping the variables onto the other principal components, e.g. pc2 = b1 v1 + b2 v2 + b3 v3 + ... + b15 v15 pc3 = c1 v1 + c2 v2 + c3 v3 + ... + c15 v15 etc. which are of subsidiary interest. In my particular case, because my original data is so sparse and the values are restricted to 1, 0, and -1, I can pretty much work out the coefficients with pencil and paper. However, I felt that this is the sort of thing that Mathematica ought to be able to return very easily, to check my own working, and I thought it would just be a case of supplying some option to PrincipalComponents. It was when I couldn't find any way to do that, and the fact that the MSE threads did not seem to give any simple, definitive answer, that caused me to post here. I believe @Sangdon Lee has understood what I was asking for and given me the solution. Thank you very much for your interest and assistance.

POSTED BY: Marc Widdowson

Daniel Lichtblau

Daniel Lichtblau, Wolfram Research

Posted 10 months ago

Marc, I think Sangdon Lee may have provided a bit more than what was originally requested. Per my prior notes, it turns out that ``PrincipalComponents``` is really implementing something else under the hood. So his recipe not only gives the other values of interest but also shows how to do the PCA you actually want. From your description I think you want the singular values and perhaps also the conversion matrix one would apply to new data to attain the same projection.

POSTED BY: Daniel Lichtblau

Marc Widdowson

Posted 10 months ago

You're right. Thank you very much.

POSTED BY: Marc Widdowson

Anton Antonov

Anton Antonov, Accendo Data LLC

Posted 10 months ago

Thanks for referring to my implementation of Independent Component Analysis (ICA)! I recently "pacletized" the ICA and Non-Negative Matrix Factorization (NNMF) into the paclet "DimensionReducers". (So, I can have a more modular implementation of my LSA monad.)

POSTED BY: Anton Antonov

Claude Mante

Claude Mante, Retired

Posted 10 months ago

Thanks a lot for sharing! I frequently use your nice MonadicQuantileRegression workflow.

POSTED BY: Claude Mante

Anton Antonov

Anton Antonov, Accendo Data LLC

Posted 10 months ago

Good to hear! It inspires me "finish" the documentation of the LSA monad...

POSTED BY: Anton Antonov

Marc Widdowson

Posted 10 months ago

Thank you very much. I have seen that post and some similar ones. Unfortunately, that post illustrates the problem in that it talks about manually reproducing the effect of the `PrincipalComponentAnalysis` function rather than pointing to any built-in Mathematica functionality, and in any case it doesn't answer my specific question about getting the coefficients. I know that I can work out the coefficients from first principles. However, I would have thought that there ought to be a function in Mathematica for doing this. The closest solution I have seen is in this MSE post, which suggests using the `KarhunenLoeveDecomposition` function because that does return the transformed axes as vectors expressed in the original basis, though it also requires some manual manipulation of the data to get it zero-centred before applying the function. I became interested in this because a friend was doing a PCA on some data in R and I thought I would like to reproduce what he did in Mathematica. At first I thought great, when I saw that there is a `PrincipalComponentAnalysis` function, but then I could not find any simple way to get the coefficients, which my friend gets easily and automatically in R. I am surprised that my options are either to use a different, more general purpose function, `KarhunenLoeveDecomposition`, which is not ideal for this application, or to program it manually from first principles. Nevertheless, it seems that must be the case.

POSTED BY: Marc Widdowson

Daniel Lichtblau

Daniel Lichtblau, Wolfram Research

Posted 10 months ago

There is this MSE post that might be of use.

POSTED BY: Daniel Lichtblau

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Group Abstract

Feedback