0
|
15671 Views
|
11 Replies
|
7 Total Likes
View groups...
Share
GROUPS:

# How to perform a correspondence analysis?

Posted 10 years ago
 In social sciences P. Bourdieu is regarded as the "father of correspondence analysis" (1970s). Outside the frankophone world this method appears to be neglected. I would like to perform correspondence analyses in the field of social sciences, but could only come across an Excel add-in (which I couldn't get working) and an R-library. I can hardly believe that the beautiful Mathematica language doesn't support this method in some way. Any good ideas out there are greatly appreciated. Many thanks in advance, Leo
11 Replies
Sort By:
Posted 10 years ago
 A little program of correspondence analysis that I make with Mathematica 9 André Dauphiné Geographer University of Nice Sophia-antipolis (*Analyse des Correspondances*) (*Importation des données, \ suppression du nom des variables*) (*les noms des lignes sont placées \ dans le fichier nom*) ClearAll["Global*"] d = Import[SystemDialogInput["FileOpen"], "XLS"]; d = Flatten[d, 1]; nomcol = Take[d, 1]; nomcol = Drop[Flatten[nomcol], 1]; d1 = Transpose[d]; nomligne = Take[d1, 1]; nomligne = Drop[Flatten[nomligne], 1]; d2 = Drop[d, 1, 1]; touttot = Total[Flatten[d2]]; totcol = Total[d2]/touttot; totlig = Total[Transpose[d2]]/touttot; ncol = Length[totcol]; nlig = Length[totlig]; d3 = d2/touttot; d4 = Table[d3[[i, All]]/totlig[[i]], {i, 1, nlig}]; d5 = Table[d4[[All, i]]*(1/Sqrt[totcol[[i]]]), {i, 1, ncol}]; d5 = Transpose[d5]; dcent = Table[d5[[All, i]] - Sqrt[totcol[[i]]], {i, 1, ncol}]; dcor = Covariance[Transpose[dcent]]; Print["Matrice des covariances"] dcotmat = dcor; Grid[dcotmat, Frame -> All, Alignment -> "."] {vals, vecs} = Eigensystem[N[dcor]]; recvals = Sqrt[vals]; trac = Total[vals]; Print["Pourcentage de variance par chaque composante"] pcvals = (vals/trac)*100; Column[pcvals, Frame -> All, Alignment -> "."] ListPlot[pcvals, Filling -> Axis] stur = recvals*vecs; Print["Saturations composantes variables"] saturations = Transpose[stur]; sturt = N[saturations]; Grid[sturt, Frame -> All, Alignment -> "."] proj = vecs.dcent; projt = Transpose[proj]; Print["Saturations composantes objets"] projc = N[projt]; Grid[projc, Frame -> All, Alignment -> "."] ny = DialogInput[ DynamicModule[{name = ""}, Column[{"Choisir la composante 1", InputField[Dynamic[name], String], ChoiceButtons[{DialogReturn[name], DialogReturn[]}]}]]] ny = ToExpression[ny]; nx = DialogInput[ DynamicModule[{name = ""}, Column[{"Choisir la composante 2", InputField[Dynamic[name], String], ChoiceButtons[{DialogReturn[name], DialogReturn[]}]}]]] nx = ToExpression[nx]; Print[] Print["Graphiques des saturations composantes variables"] dcomp = Partition[ Riffle[saturations[[All, ny]], saturations[[All, nx]]], 2]; dcompespace = Partition[Riffle[projt[[All, ny]], projt[[All, nx]]], 2]; a1 = ListPlot[dcomp, PlotStyle -> Directive[PointSize[Large], Red], AxesOrigin -> {0, 0}, AxesLabel -> {ny, nx}] Print[] Print["Graphiques des saturations composantes objets"] a2 = ListPlot[dcompespace, PlotStyle -> Directive[PointSize[Large], Blue], AxesOrigin -> {0, 0}, AxesLabel -> {ny, nx}] Print[] Print["Graphiques des saturations composantes variables et \ composantes objets"] Show[a1, a2] 
Posted 10 years ago
 Hi Anton,many thanks for your thoughts, very much appreciated. I definitely have homework to do and chances are that I will get back.Best wishes, Leo
Posted 10 years ago
 Hi Leo,First and foremost thank you for brining this topic in this forum!Below are responses to some of your questions and remarks. Correspondence analysis is just SVD applied over contingency matrices with entries modified by Chi^2 related formulas. So yes, Mathematica supports correspondence analysis "in some way" because these steps are easy to program and apply. I read the referenced article from The Mathematica Journal. You can download the notebook and use MatrixPlot over the contingency matrices to get something more visually appealing than the print outs with TableForm. I find you saying that Figure 5 in that article does not make Mathematica justice meaningless. What is plotted corresponds to what are the background explanations of the plot. What do you mean with that remark? I understand that you want an out of the box function or package that does correspondence analysis. Take a look at Mathematica's function PrincipalComponents. Its functionality, guide, and demonstrations are probably very close to what you are asking for.
Posted 10 years ago
 Thank you for your response, Anton. Please bear in mind that my academic home is the social sciences. Not that we know nothing about maths and statistics, however, we are certainly no mathematicians. As a result, you may find my understanding of your responses somewhat disappointing: "Correspondence analysis ist just SVD applied ... with entries modified by Chi^2 related formulas" - I am sure it is, but I doubt "these steps are easy to program and apply" otherwise someone would have done it already, or am I wrong? Certainly I would not know how to start .. Good tip, thanks! What I meant with fig 5 "doesn't do Mathematica justice" in the Mathematica Journal http://www.mathematica-journal.com/2010/09/an-introduction-to-correspondence-analysis/ is this: in correspondence analysis, creating the output is just half the work. Interpreting the output is difficult, to say the least. Look at the question we want to answer: There are two text fragments; who is their likely author? Fig 5 is supposed to give us the answer. The proximity of "MT2" and "MT3" to TextX1 allows us to speculate that its author is Mark Twain. For TextX2 Thomas Hobbes may be a reasonable assumption, but it could also be Rene Descartes. Fig 5 shows a very simple example and my understanding is that interpretation of the output requires, in addition to proximity, also information regarding the variances explained. Often there are more dimensions. Now compare fig 5 to a demonstration project like http://demonstrations.wolfram.com/CelestialNavigation/ or http://demonstrations.wolfram.com/Tetration/I very much understand that these projects are nowhere near solutions to what I am looking for. I feel, however, that they could point to directions of how the information contained in fig 5 could be presented such that interpretation becomes more intuitive. High explained variances could be colour coded green, low ones red, etc.I hope this makes sense, somehow.
Posted 10 years ago
 "[...] I am sure it is, but I doubt "these steps are easy to program and apply" otherwise someone would have done it already, [...]"Yes, you are correct. The TMJ article referenced above ("An Introduction to Correspondence Analysis") has these steps. See the definition and application of the function chisqd and the application of SingularValueDecomposition`."[...] or am I wrong? Certainly I would not know how to start .."I think a good start is to download and interactively read the notebook of the article "An Introduction to Correspondence Analysis": http://www.mathematica-journal.com/data/uploads/2010/09/Yelland.nb ."[...] in correspondence analysis, creating the output is just half the work. Interpreting the output is difficult, to say the least."This probably means that correspondence analysis as a technique is not mature enough and / or a different paradigm should be used.Several suggestions come to mind. Use Non-Negative Matrix Factorization (NNMF) instead of SVD (in correspondence analysis). SVD produces orthogonal basis vectors with coordinates that have mixed signs, and hence are hard to interpret if SVD is applied over contingency matrices of categorical data. The basis vectors of NNMF have positive coordinates and can be easily interepretted, but they are not orthogonal. If you can phrase your problem as a classification problem, then the applications of Decision Trees and Naive Bayesian Classifiers might give nice, easy to interpret insights into your data. (For example, see Mathematica's function Classify.) These classifiers have several methods and techniques to evaluate their prediction strength. (Hence you can evaluate your insight derived by using them.) Abandon the vector space representation and use probabilistic models, like Random walks, N-grams, Markov chains, Hidden Markov models. (All these apply to the text classification problem.)
Posted 10 years ago
 Hello,Please have a look at "Correspondence Analysis and Data Coding with Java and R" by F. Murtagh, published 2005. Prof. Murtagh did his PhD on CA with Prof. J.-P Benzecri (indeed a prominent figure in the French school of data analysis) who added the mathematical elegance to the theory using Einstein's tensor notation while in Princeton. The book has a companion site http://www.correspondances.info which has the the source code and data sets used in the book. The ISBN is 1-58488-528-9Indeed, the only Mathematica treatment I was able to find was that in the paper cited above. I hope this helps you accomplish your goal.
Posted 10 years ago
 Very many thanks, excellent link. More burning of midnight oil ...
Posted 10 years ago
 Hi Leo,A bit of a google search comes up with this which may be of help:http://www.mathematica-journal.com/2010/09/an-introduction-to-correspondence-analysis/(There is a link in the above article that allows you to download it in Mathematica Notebook form so you have direct access to the code.)
Posted 10 years ago
 Hi David,I do appreciate your effort and help. As a matter of fact, I came across this article, but look at the output on Figure 5 (just before 8. Conclusion): it doesn't do Mathematica justice, I am afraid. (I know I am difficult ...)It does appear as there is a white spot on the map, somewhere.
Posted 10 years ago