Again, I apologise for causing confusion by my inaccuracy in using the wrong name for the function PrincipalComponents
.
Here is the situation.
The data characterises a range of societies in terms of 15 sociological attributes, each of which can have the value either +1, 0, or -1. So each society is a 15-dimensional point. It is suspected that some of these attributes or dimensions are correlated and this is the kind of thing that PCA ought to be able to pick up.
There are thousands of datapoints, and just for illustration the first five are as follows: {{0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., -1., 0., 0., 0.}, {0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., -1., 0., 0., 0.}, {0., 0.,
0., -1., 0., 0., 0., 0., 0., 0., 0., -1., 0., 0., 0.}, {0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., -1., 0., 0., 0.}, {0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., -1., 0., 0., 0.}}
When I supply the full dataset to PrincipalComponents
, I get back a list of new 15-dimensional points in the transformed dimensions, which appear in order of the amount of variance they contain. Here are the first five points of the transformed data: {{0.628139, 0.080849, 0.141136, 0.215542, 0.123124, 0.475692,
0.250839, 0.0970342, 0.369091, 0.187105, 0.232599, 0.16679,
0.123272, 0.0682626, 0.096034}, {0.628139, 0.080849, 0.141136,
0.215542, 0.123124, 0.475692, 0.250839, 0.0970342, 0.369091,
0.187105, 0.232599, 0.16679, 0.123272, 0.0682626,
0.096034}, {0.326903, -0.0281672, 0.16276, 0.355153, 0.102935,
0.592857, 0.242404, -0.149108, 0.177386,
0.461905, -0.269042, -0.0351931, 0.341595, -0.0184339,
0.681469}, {0.628139, 0.080849, 0.141136, 0.215542, 0.123124,
0.475692, 0.250839, 0.0970342, 0.369091, 0.187105, 0.232599,
0.16679, 0.123272, 0.0682626, 0.096034}, {0.628139, 0.080849,
0.141136, 0.215542, 0.123124, 0.475692, 0.250839, 0.0970342,
0.369091, 0.187105, 0.232599, 0.16679, 0.123272, 0.0682626,
0.096034}}
I can calculate the variance along each dimension just by applying Variance
to the list returned by PrincipalComponents
, and I find that the first dimension (first principal component) has about 65% of the total variance.
Now I know that that each dimension of the transformed data is a linear combination of the original variables / dimensions / attributes. So if we write pc1 for the value on the first principal component (0.628139 in the case of the first datapoint above) and v1...v15 for my original variable values ({0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., -1., 0., 0., 0.} in the case of the first datapoint above), I know that
pc1 = a1 v1 + a2 v2 + a3 v3 + ... + a15 v15
where a1...a15 are coefficients describing the transformation. It is these coefficients that I want to find. This is because, if the coefficient is large for a particular variable, it means that that variable contributes strongly to the first principal component, and that is of interest because it means that that variable is particularly good at capturing the distinctive character of each society.
Of course, there are also coefficients mapping the variables onto the other principal components, e.g.
pc2 = b1 v1 + b2 v2 + b3 v3 + ... + b15 v15
pc3 = c1 v1 + c2 v2 + c3 v3 + ... + c15 v15
etc.
which are of subsidiary interest.
In my particular case, because my original data is so sparse and the values are restricted to 1, 0, and -1, I can pretty much work out the coefficients with pencil and paper. However, I felt that this is the sort of thing that Mathematica ought to be able to return very easily, to check my own working, and I thought it would just be a case of supplying some option to PrincipalComponents. It was when I couldn't find any way to do that, and the fact that the MSE threads did not seem to give any simple, definitive answer, that caused me to post here.
I believe @Sangdon Lee has understood what I was asking for and given me the solution.
Thank you very much for your interest and assistance.