# Visualize Machine Learning Data: From Python to Wolfram Language

Posted 4 years ago
13438 Views
|
6 Replies
|
14 Total Likes
|

I was reading an online article about using the Pandas package of Python. I think it might be fun to see how Wolfram Language can handle these tasks in manageable amount of code. Lets begin:

(*code source*)
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data"


# Import Data

### Python

import matplotlib.pyplot as plt
import pandas
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']


### Wolfram Language

(* \$Version 11.1.0 for Mac OS X x86 (64-bit) (March 16, 2017)*)
data = Import[url];
names = {"preg", "plas", "pres", "skin", "test", "mass", "pedi", "age", "class"};
dataset = Dataset[Map[Association @@ Thread[names -> #] &, data]] (* turns {6,148} to <| preg -> 6 ,  plas -> 148 |>, row-wisely *) # Univariate Plots

## Histogram

### Python

data.hist()
plt.show()


### Wolfram Language

width["class"] = {0.1}; (* Very flexible to adjust the width of the bins on the fly *)
width[item_] := Automatic;
Histogram[dataset[All, #], width[#], PlotLabel -> #] & /@ names  ## Density Plot

The plots look like an abstracted histogram with a smooth curve drawn through the top of each bin, much like your eye tried to do with the histograms.

So in Wolfram Language this is done by using a automatic smoothing kernel in SmoothHistogram. Just to be clear here, DensityHistogram in Mathematica means something different, which is like a 2D density plot with discrete color scale.

### Python

data.plot(kind='density', subplots=True, layout=(3,3), sharex=False)
plt.show()


### Wolfram Language

SmoothHistogram[dataset[All, #], PlotRange -> Full, PlotLabel -> #] & /@ names ## Box Whisker Plot

### Python

data.plot(kind='box', subplots=True, layout=(3,3), sharex=False, sharey=False)
plt.show()


### Wolfram Language

BoxWhiskerChart[dataset[All, #], PlotLabel -> #] & /@ names # Multivariate Plot

## Correlation Matrix Plot

### Python

correlations = data.corr()
# plot correlation matrix
fig = plt.figure()
cax = ax.matshow(correlations, vmin=-1, vmax=1)
fig.colorbar(cax)
ticks = numpy.arange(0,9,1)
ax.set_xticks(ticks)
ax.set_yticks(ticks)
ax.set_xticklabels(names)
ax.set_yticklabels(names)
plt.show()


### Wolfram Language

You are allowed to tweak the data with simple code here to have more control:

n = Length[names]
corr[dataset_, tuple_] := Correlation[
N@Normal@dataset[All, tuple[]],
N@Normal@dataset[All, tuple[]]
]


Generate some tuple of name pair

grid = Partition[Tuples[names, 2], n]
(*{{{preg,preg},{preg,plas},{preg,pres},{preg,skin},{preg,test} ... }}}*)
res = Map[corr[dataset, #] &, grid, {2}]
(* create correlation matrix : {{1., 0.129459, 0.141282, -0.0816718,...},...} *)


Control label display by yourself

 xLabel=yLabel=Transpose[{Range[n],names}]
(* {{1,preg},{2,plas},{3,pres},{4,skin},{5,test},{6,mass},{7,pedi},{8,age},{9,class}} *)


Use MatrixPlot to create a color grid:

MatrixPlot[res,
FrameTicks -> {{yLabel, None}, {None, xLabel}},
FrameStyle -> Directive[14, Italic]
] ## Scatterplot Matrix

### Python

scatter_matrix(data) #default settings
plt.show()


### Wolfram Language

The plot is broken down to a combo of ListPlot and Histogram according to the python article I was reading. For each factor pair, I use the following function to create data points and plot them together:

pair[dataset_, tuple_] := Transpose[{
N@Normal@dataset[All, tuple[]],
N@Normal@dataset[All, tuple[]]
}]

ListPlot[pair[dataset, {"age", "skin"}], PlotStyle -> PointSize[0.02], AspectRatio -> 1] (*customized settings*) So I create a plot function as a wrapper upon the pair and corr function with the similar signature

plotfun[dataset_, tuple_] := If[tuple[] === tuple[],
Histogram[dataset[All, tuple[]], Ticks -> None],(*histogram on diagonal*)
ListPlot[pair[dataset, tuple], PlotStyle -> PointSize[0.03], AspectRatio -> 1, Ticks -> None, Axes -> None, ImageSize -> {80, 80}]
]


The ListPlot is somewhat longer than the Python plot function because I think I can show some beautification settings in Mathematica to have the plot a professional look. Python users can point out how to do this with matplotlib in the comment. The action code to implement the plot over a grid is one-liner:

Grid[Map[plotfun[dataset, #] &, grid, {2}]] // Rasterize The working notebook is attached. Please download and play around. The test data is also provided in testdata.m in the form of Dataset[...] in case the url above is dead. You can just import this file with data=Import["<path>/<to>/testdata.m"] and the dataset is ready to be used in Mathematica. Attachments: Answer
6 Replies
Sort By:
Posted 4 years ago
 I think your solutions are nice, but definitely not the easiest. For example the histogram example can be succinctly written as: Transpose[dataset][All, Histogram] Answer
Posted 4 years ago
 This is great! Always learn something new from the community. For this histogram plot for "class" field, it seems Mathematica makes the bin so wide that the result is misleading. I am not sure if options can be passed into your style directly. Answer
Posted 4 years ago
 I'm not sure if I entirely see what you are saying, but if you want to pass arguments to Histogram like a bin specification you can do something like this: Transpose[dataset][All, Histogram[#, {0.1}]&] where {0.1} is just a random bin specification. Answer
Posted 4 years ago
 If you do Transpose[dataset][All, f] you have the following after you click row "class": Here f does not take the Key["Class"] as a parameter. So you won't have customized output based on the key. In particular, I need width = 0.1 only for the class row and automatic width for the rest.  Answer
Posted 4 years ago - Congratulations! This post is now a Staff Pick! Thank you for your wonderful contributions. Please, keep them coming! Answer
Posted 3 years ago
 I appreciated this post for practicing visual representations in Mathematica. I think I have a simpler way to make the MatrixPlot, however: MatrixPlot[Correlation[Normal@dataset[Values]]] Answer