Message Boards Message Boards

0
|
16415 Views
|
11 Replies
|
7 Total Likes
View groups...
Share
Share this post:

How to perform a correspondence analysis?

Posted 11 years ago

In social sciences P. Bourdieu is regarded as the "father of correspondence analysis" (1970s). Outside the frankophone world this method appears to be neglected. I would like to perform correspondence analyses in the field of social sciences, but could only come across an Excel add-in (which I couldn't get working) and an R-library.

I can hardly believe that the beautiful Mathematica language doesn't support this method in some way. Any good ideas out there are greatly appreciated.

Many thanks in advance, Leo

POSTED BY: Leo Hamminger
11 Replies

A little program of correspondence analysis that I make with Mathematica 9 André Dauphiné Geographer University of Nice Sophia-antipolis

(*Analyse des Correspondances*)
(*Importation des données, \
suppression du nom des variables*)
(*les noms des lignes sont placées \
dans le fichier nom*)

ClearAll["Global`*"]
d = Import[SystemDialogInput["FileOpen"], "XLS"];
d = Flatten[d, 1];
nomcol = Take[d, 1];
nomcol = Drop[Flatten[nomcol], 1];
d1 = Transpose[d];
nomligne = Take[d1, 1];
nomligne = Drop[Flatten[nomligne], 1];
d2 = Drop[d, 1, 1];
touttot = Total[Flatten[d2]];
totcol = Total[d2]/touttot;
totlig = Total[Transpose[d2]]/touttot;
ncol = Length[totcol];
nlig = Length[totlig];
d3 = d2/touttot;
d4 = Table[d3[[i, All]]/totlig[[i]], {i, 1, nlig}];
d5 = Table[d4[[All, i]]*(1/Sqrt[totcol[[i]]]), {i, 1, ncol}];
d5 = Transpose[d5];
dcent = Table[d5[[All, i]] - Sqrt[totcol[[i]]], {i, 1, ncol}];
dcor = Covariance[Transpose[dcent]];
Print["Matrice des covariances"]
dcotmat = dcor;
Grid[dcotmat, Frame -> All, Alignment -> "."]
{vals, vecs} = Eigensystem[N[dcor]];
recvals = Sqrt[vals];
trac = Total[vals];
Print["Pourcentage de variance par chaque composante"]
pcvals = (vals/trac)*100;
Column[pcvals, Frame -> All, Alignment -> "."]
ListPlot[pcvals, Filling -> Axis]
stur = recvals*vecs;
Print["Saturations composantes variables"]
saturations = Transpose[stur];
sturt = N[saturations];
Grid[sturt, Frame -> All, Alignment -> "."]
proj = vecs.dcent;
projt = Transpose[proj];
Print["Saturations composantes objets"]
projc = N[projt];
Grid[projc, Frame -> All, Alignment -> "."]
ny = DialogInput[
  DynamicModule[{name = ""}, Column[{"Choisir la composante 1",
     InputField[Dynamic[name], String],
     ChoiceButtons[{DialogReturn[name], DialogReturn[]}]}]]]
ny = ToExpression[ny];
nx = DialogInput[
  DynamicModule[{name = ""}, Column[{"Choisir la composante 2",
     InputField[Dynamic[name], String],
     ChoiceButtons[{DialogReturn[name], DialogReturn[]}]}]]]
nx = ToExpression[nx];
Print[]
Print["Graphiques des saturations composantes variables"]
dcomp = Partition[
   Riffle[saturations[[All, ny]], saturations[[All, nx]]], 2];
dcompespace = Partition[Riffle[projt[[All, ny]], projt[[All, nx]]], 2];
a1 = ListPlot[dcomp, PlotStyle -> Directive[PointSize[Large], Red], 
  AxesOrigin -> {0, 0}, AxesLabel -> {ny, nx}]
Print[]
Print["Graphiques des saturations composantes objets"]
a2 = ListPlot[dcompespace, 
  PlotStyle -> Directive[PointSize[Large], Blue], 
  AxesOrigin -> {0, 0}, AxesLabel -> {ny, nx}]
Print[]
Print["Graphiques des saturations composantes variables et \
composantes objets"]
Show[a1, a2]
POSTED BY: André Dauphiné
Posted 10 years ago

Hi Anton,

many thanks for your thoughts, very much appreciated. I definitely have homework to do and chances are that I will get back.

Best wishes, Leo

POSTED BY: Leo Hamminger

"[...] I am sure it is, but I doubt "these steps are easy to program and apply" otherwise someone would have done it already, [...]"

Yes, you are correct. The TMJ article referenced above ("An Introduction to Correspondence Analysis") has these steps. See the definition and application of the function chisqd and the application of SingularValueDecomposition.

"[...] or am I wrong? Certainly I would not know how to start .."

I think a good start is to download and interactively read the notebook of the article "An Introduction to Correspondence Analysis": http://www.mathematica-journal.com/data/uploads/2010/09/Yelland.nb .

"[...] in correspondence analysis, creating the output is just half the work. Interpreting the output is difficult, to say the least."

This probably means that correspondence analysis as a technique is not mature enough and / or a different paradigm should be used.

Several suggestions come to mind.

  1. Use Non-Negative Matrix Factorization (NNMF) instead of SVD (in correspondence analysis). SVD produces orthogonal basis vectors with coordinates that have mixed signs, and hence are hard to interpret if SVD is applied over contingency matrices of categorical data. The basis vectors of NNMF have positive coordinates and can be easily interepretted, but they are not orthogonal.

  2. If you can phrase your problem as a classification problem, then the applications of Decision Trees and Naive Bayesian Classifiers might give nice, easy to interpret insights into your data. (For example, see Mathematica's function Classify.) These classifiers have several methods and techniques to evaluate their prediction strength. (Hence you can evaluate your insight derived by using them.)

  3. Abandon the vector space representation and use probabilistic models, like Random walks, N-grams, Markov chains, Hidden Markov models. (All these apply to the text classification problem.)

POSTED BY: Anton Antonov
Posted 10 years ago

Thank you for your response, Anton.

Please bear in mind that my academic home is the social sciences. Not that we know nothing about maths and statistics, however, we are certainly no mathematicians. As a result, you may find my understanding of your responses somewhat disappointing:

  1. "Correspondence analysis ist just SVD applied ... with entries modified by Chi^2 related formulas" - I am sure it is, but I doubt "these steps are easy to program and apply" otherwise someone would have done it already, or am I wrong? Certainly I would not know how to start ..

  2. Good tip, thanks!

  3. What I meant with fig 5 "doesn't do Mathematica justice" in the Mathematica Journal http://www.mathematica-journal.com/2010/09/an-introduction-to-correspondence-analysis/ is this: in correspondence analysis, creating the output is just half the work. Interpreting the output is difficult, to say the least. Look at the question we want to answer: There are two text fragments; who is their likely author? Fig 5 is supposed to give us the answer. The proximity of "MT2" and "MT3" to TextX1 allows us to speculate that its author is Mark Twain. For TextX2 Thomas Hobbes may be a reasonable assumption, but it could also be Rene Descartes.

Fig 5 shows a very simple example and my understanding is that interpretation of the output requires, in addition to proximity, also information regarding the variances explained. Often there are more dimensions.

Now compare fig 5 to a demonstration project like http://demonstrations.wolfram.com/CelestialNavigation/ or http://demonstrations.wolfram.com/Tetration/

I very much understand that these projects are nowhere near solutions to what I am looking for. I feel, however, that they could point to directions of how the information contained in fig 5 could be presented such that interpretation becomes more intuitive. High explained variances could be colour coded green, low ones red, etc.

I hope this makes sense, somehow.

POSTED BY: Leo Hamminger
Posted 10 years ago

Very many thanks, excellent link. More burning of midnight oil ...

POSTED BY: Leo Hamminger

Hi Leo,

First and foremost thank you for brining this topic in this forum!

Below are responses to some of your questions and remarks.

  1. Correspondence analysis is just SVD applied over contingency matrices with entries modified by Chi^2 related formulas. So yes, Mathematica supports correspondence analysis "in some way" because these steps are easy to program and apply.

  2. I read the referenced article from The Mathematica Journal. You can download the notebook and use MatrixPlot over the contingency matrices to get something more visually appealing than the print outs with TableForm.

  3. I find you saying that Figure 5 in that article does not make Mathematica justice meaningless. What is plotted corresponds to what are the background explanations of the plot. What do you mean with that remark?

  4. I understand that you want an out of the box function or package that does correspondence analysis. Take a look at Mathematica's function PrincipalComponents. Its functionality, guide, and demonstrations are probably very close to what you are asking for.

POSTED BY: Anton Antonov
Posted 10 years ago

Hello,

Please have a look at "Correspondence Analysis and Data Coding with Java and R" by F. Murtagh, published 2005. Prof. Murtagh did his PhD on CA with Prof. J.-P Benzecri (indeed a prominent figure in the French school of data analysis) who added the mathematical elegance to the theory using Einstein's tensor notation while in Princeton. The book has a companion site http://www.correspondances.info which has the the source code and data sets used in the book. The ISBN is 1-58488-528-9

Indeed, the only Mathematica treatment I was able to find was that in the paper cited above. I hope this helps you accomplish your goal.

POSTED BY: Mohsen Farid
Posted 11 years ago

Hi David,

I do appreciate your effort and help.

As a matter of fact, I came across this article, but look at the output on Figure 5 (just before 8. Conclusion): it doesn't do Mathematica justice, I am afraid. (I know I am difficult ...)

It does appear as there is a white spot on the map, somewhere.

POSTED BY: Leo Hamminger

Hi Leo,

A bit of a google search comes up with this which may be of help:

http://www.mathematica-journal.com/2010/09/an-introduction-to-correspondence-analysis/

(There is a link in the above article that allows you to download it in Mathematica Notebook form so you have direct access to the code.)

POSTED BY: David Reiss
Posted 11 years ago

Many thanks for the swift reply, David, and thank your for the R-links. I may have to resort to R, after all. However, my maths / stats expertise, including programming, is limited to that of the average social science postgraduate.

Basically correspondence analysis is used to determine correlation (better: contingency) between categorical variables. The presentation of this fact is the tricky part. I have discovered some dated Mathematica Solutions (e.g., 5.0), using line printer output.

The author of the text in the link below (Greenacre, he appears to be the present-day guru in this field) gives an example at page 26ff, with a sample Output on p 29 using an Excel Add-in. Surely Mathematica should offer better solutions.

http://statmath.wu.ac.at/courses/CAandRelMeth/CARME1.pdf

The following link illustrates why the presentation of output is so important (e.g., p. 14): interpreting correspondence Analysis results is difficult. My hope is to create interactive output, as available in Mathematica, to be able to inspect 3-d graphs from several angles, etc.

http://www.skeptron.uu.se/broady/sec/p-gda-0609-III.pdf

POSTED BY: Leo Hamminger

Though I do not know anything about correspondence analysis (could you give a link just out of curiosity?), you can possibly make use of the R-library from Mathematica using RLink:

http://reference.wolfram.com/language/RLink/tutorial/Reference.html

http://reference.wolfram.com/language/RLink/guide/RLink.html

POSTED BY: David Reiss
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract