Message Boards Message Boards

Mosaic plots for data visualization

Posted 11 years ago
I just published a blog post proclaiming the implementation of the function MosaicPlot that gives visual representation of the contingencies of categorical variables in a list of records. The blog post has examples and explanations:
http://mathematicaforprediction.wordpress.com/2014/03/17/mosaic-plots-for-data-visualization/

If we consider the census income data set known as the "adult data set" that is summarized in this table:



we visualize the co-occurence of (categorical variable) values with mosaic plots like this one:



By comparing the sizes of the rectangles corresponding to values “Bachelors”, “Doctorate”, “Masters”, and “Some-college” on the “sex vs. education” mosaic plot we can see that the fraction of men that have finished college is larger than the fraction of women that have finished college.
We can further subdivide the rectangles according the co-occurrence frequencies with a third categorical variable. We are going to choose that third variable to be “income”, the values of which can be seen as outcomes or consequents of the values of the first two variables of the mosaic plot.



From the mosaic plot "sex vs. education vs. income" we can make the following observations.1. Approximately 75% of the males with doctorate degrees or with a professional school degree earn more than $50000 per year.2. Approximately 60% of the females with a doctorate degree earn more than $50000 per year.3. Approximately 45% of the females with a professional school degree earn more than $50000.4. Across all education type females are (much) less likely to earn more than $50000 per year.
POSTED BY: Anton Antonov
6 Replies

Here is the corresponding "MosaicPlot" paclet:

enter image description here

POSTED BY: Anton Antonov
Here is a new blog post of mine that analyzes further the census income data:
"Classification and association rules for census income data",
http://mathematicaforprediction.wordpress.com/2014/03/30/classification-and-association-rules-for-census-income-data/.

I found using MosaicPlot with Manipulate and Tooltip very useful:

POSTED BY: Anton Antonov
Just published a blog post describing the enhancement I implemented during this week of MosaicPlot:http://mathematicaforprediction.wordpress.com/2014/03/24/enhancements-of-mosaicplot/

The functionality that took me most effort and designing to do was the coloring of the rectangles. I chose 
an approach that makes the plots easier to read. Here is a grid of examples:


I also updated my previous posts in this discussion with color plots.
POSTED BY: Anton Antonov
Thanks, Mark!

Here is the code for the function RecordSummary that can be used together with Grid to make summary tables:
 Clear[DataColumnsSummary]
 Options[DataColumnsSummary] = {"MaxTallies" -> 7, "NumberedColumns" -> True};
 DataColumnsSummary[dataColumns_, opts : OptionsPattern[]] :=
   DataColumnsSummary[dataColumns,
    Table["column " <> ToString[i], {i, 1, Length[dataColumns]}], opts];
 DataColumnsSummary[dataColumns_, columnNamesArg_, opts : OptionsPattern[]] :=
   Block[{columnTypes, columnNames = columnNamesArg,
      maxTallies = OptionValue[DataColumnsSummary, "MaxTallies"],
      numberedColumnsQ =
      TrueQ[OptionValue[DataColumnsSummary, "NumberedColumns"]]},
    If[numberedColumnsQ,
     columnNames =
      MapIndexed[ToString[#2[[1]]] <> " " <> #1 &, columnNames]
     ];
    columnTypes =
     Map[If[NumberQ[#], Number, Symbol] &, dataColumns[[All, 1]]];
    MapThread[
     Column[{
        Style[#1, Blue, FontFamily -> "Times"],
        If[TrueQ[#2 === Number],
         Grid[NumericVectorSummary[#3], Alignment -> Left],
         Grid[CategoricalVectorSummary[#3, maxTallies],
          Alignment -> Left]
         ]}] &, {columnNames, columnTypes, dataColumns}, 1]
    ] /; Length[dataColumns] == Length[columnNamesArg];
Clear[RecordsSummary];
RecordsSummary[dataRecords_, opts : OptionsPattern[]] :=
  DataColumnsSummary[Transpose[dataRecords], opts];
RecordsSummary[dataRecords_, columnNames_, opts : OptionsPattern[]] :=
   DataColumnsSummary[Transpose[dataRecords], columnNames, opts];
POSTED BY: Anton Antonov
Posted 11 years ago
Thank you for this post and linking to you blog,esp mosaic plots with data summaries and quantile regression. Look forward to exploring these.
POSTED BY: Mark Dooris
I updated the implementation of the function MosaicPlot to have an interactive feature using Tooltip that gives a table with the exact co-occurrence (contingency) values when hovering with the mouse over the rectangles.

Here is an example:
POSTED BY: Anton Antonov
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract