WolframAlpha.com
WolframCloud.com
All Sites & Public Resources...
Products & Services
Wolfram|One
Mathematica
Wolfram|Alpha Notebook Edition
Programming Lab
Finance Platform
SystemModeler
Wolfram Player
Wolfram Engine
WolframScript
Enterprise Private Cloud
Enterprise Mathematica
Wolfram|Alpha Appliance
Enterprise Solutions
Corporate Consulting
Technical Consulting
Wolfram|Alpha Business Solutions
Resource System
Data Repository
Neural Net Repository
Function Repository
Wolfram|Alpha
Wolfram|Alpha Pro
Problem Generator
API
Data Drop
Products for Education
Mobile Apps
Wolfram Player
Wolfram Cloud App
Wolfram|Alpha for Mobile
Wolfram|Alpha-Powered Apps
Services
Paid Project Support
Wolfram U
Summer Programs
All Products & Services »
Technologies
Wolfram Language
Revolutionary knowledge-based programming language.
Wolfram Cloud
Central infrastructure for Wolfram's cloud products & services.
Wolfram Science
Technology-enabling science of the computational universe.
Wolfram Notebooks
The preeminent environment for any technical workflows.
Wolfram Engine
Software engine implementing the Wolfram Language.
Wolfram Natural Language Understanding System
Knowledge-based broadly deployed natural language.
Wolfram Data Framework
Semantic framework for real-world data.
Wolfram Universal Deployment System
Instant deployment across cloud, desktop, mobile, and more.
Wolfram Knowledgebase
Curated computable knowledge powering Wolfram|Alpha.
All Technologies »
Solutions
Engineering, R&D
Aerospace & Defense
Chemical Engineering
Control Systems
Electrical Engineering
Image Processing
Industrial Engineering
Mechanical Engineering
Operations Research
More...
Finance, Statistics & Business Analysis
Actuarial Sciences
Bioinformatics
Data Science
Econometrics
Financial Risk Management
Statistics
More...
Education
All Solutions for Education
Trends
Machine Learning
Multiparadigm Data Science
Internet of Things
High-Performance Computing
Hackathons
Software & Web
Software Development
Authoring & Publishing
Interface Development
Web Development
Sciences
Astronomy
Biology
Chemistry
More...
All Solutions »
Learning & Support
Learning
Wolfram Language Documentation
Fast Introduction for Programmers
Wolfram U
Videos & Screencasts
Wolfram Language Introductory Book
Webinars & Training
Summer Programs
Books
Need Help?
Support FAQ
Wolfram Community
Contact Support
Premium Support
Premier Service
Technical Consulting
All Learning & Support »
Company
About
Company Background
Wolfram Blog
Events
Contact Us
Work with Us
Careers at Wolfram
Internships
Other Wolfram Language Jobs
Initiatives
Wolfram Foundation
MathWorld
Computer-Based Math
A New Kind of Science
Wolfram Technology for Hackathons
Student Ambassador Program
Wolfram for Startups
Demonstrations Project
Wolfram Innovator Awards
Wolfram + Raspberry Pi
Summer Programs
More...
All Company »
Search
Join
Sign In
Dashboard
Groups
People
Message Boards
Answer
(
Unmark
)
Mark as an Answer
GROUPS:
Staff Picks
Education
Statistics and Probability
Machine Learning
Wolfram Summer School
5
Farai Fredric Mlambo
[WSS20] A Computational Introduction to Statistical Modelling
Farai Fredric Mlambo, University of the Witwatersrand
Posted
6 months ago
1280 Views
|
1 Reply
|
5 Total Likes
Follow this post
|
Introduction
The Wolfram Language presents an unparalleled functional, symbolic and numerical architecture for the fitting and analysis of statistical models. Statistical modelling functions in the Wolfram Language create symbolic fitted model objects from which various properties can be queried for reporting and/or further analysis. Coupled with the Wolfram Language’s top-level superfunctions and system coherence, these symbolic fitted model objects are accessible from a wide host of print-ready Wolfram Language’s visualization tools.
With a computational language, such the Wolfram Language - Statistical Modelling and Machine Learning has never been easier. What is a computational language?
According to Stephen Wolfram -
“It’s a language for expressing things in a computational way—and for capturing computational ways of thinking about things. It’s not just a language for telling computers what to do. It’s a language that both computers and humans can use to represent computational ways of thinking about things. It’s a language that puts into concrete form a computational view of everything. It’s a language that lets one use the computational paradigm as a framework for formulating and organizing one’s thoughts”
- Stephen Wolfram’s Blog post:
What We've Built Is a Computational Language (and That's Very Important!)
Although the Wolfram Language is such a powerful computational and programming language, also being ubiquitous (usually with a campus-wide license) at most of top universities abroad and here (South Africa), students and researchers don not seem to know that they have such a tool. Moreover, those that know about the Wolfram Language seem to think that “Mathematica”, as it is known is only for mathematicians and not for students and researchers in the Natural Sciences, Engineering, Commerce, Arts and Humanities.
Being an educator (both high School and University) in the Mathematical and Statistical Sciences for about 15 years, having used Mathematica for about 10 years, I felt the need for compiling a text showcasing how one can perform statistical modelling and machine learning in the Wolfram Language. Compiling sample chapters for this text became my project for the Educational Innovation Track of the 2020 Wolfram Summer School. The proposed title of this text is:
A Computational Introduction to Statistical Modelling.
This text is intended to exhibit the art of computational and symbolic and statistical model fitting and analysis in the Wolfram Language. The text also introduces the reader to the principles of machine learning in the Wolfram Language, as an extension from the statistical model fitting and analysis. Based on the coherence of the Wolfram Language, the principles of symbolic model fitting and analysis remain consistent in the machine learning paradigm. This is made clear in the text, with examples from classical machine learning problems with predictions of quantitative outcomes and classification problems involving categorical responses.
Considering that this book is intended for students and researchers from non-statistical fields, there will be a preface, which will serves as an introduction and/or primer to the Wolfram Language. Although the reader can get a quick and high level tour of the principles of the Wolfram Language in this section, it recommended that the reader should also use the following book by Stephen Wolfram:
An Elementary Introduction to the Wolfram Language
.
This text will have a preface then the body. The preface is intended to be an introduction to the Wolfram Language. We will highlight the key principles of the language, also introducing the reader to the various resources available, as will be used in the text.
Preface: A High Level Introduction to the Wolfram Language
In this section, the reader will be introduced to the following concepts:
Licencing and Product Installation
◼
Desktop Mathematica
◼
Cloud Mathematica
◼
Students and Academic Staff
◼
Industry
◼
What is the Wolfram Language?
◼
A bit of history
◼
Mathematica
◼
A Computational Language
◼
Working with Notebooks
◼
What are Notebooks?
◼
Why Notebooks?
◼
Computational Essays
◼
Working with
Wolfram Documentation
◼
What is the Wolfram Documentation?
◼
Guides, Tutorials, TechNotes etc
◼
Desktop and Web Documentation
◼
Symbols, Objects and Expressions
◼
Why Symbolic?
◼
Everything is an Expression
◼
Computational Language
◼
Built-In Knowledge in the Wolfram Language
◼
Why so much Knowledge?
◼
How to use the Built-In Knowledge
◼
Data Types in the Wolfram Language
◼
Lists
◼
Associations
◼
Sparse Arrays
◼
Data Sources in the Wolfram Language
◼
Example Data
◼
The
Wolfram Data Repository
◼
Functional Programming in the Wolfram Language
◼
Programming Paradigms
◼
Why Functional Programming?
◼
Good Examples
◼
WolframAlpha
◼
What is WolframAlpha?
◼
More than a Search Engine
◼
Wolfram Alpha for Students and Educators
◼
Computable Results from WolframAlpha
◼
The
Wolfram Function Repository
◼
Why a Function Repository?
◼
How to use the Function Repository
◼
How to Contribute to the Function Repository
◼
The
Wolfram Neural Network Repository
◼
What is the Wolfram Neural Network Repository?
◼
Advantages of using the Wolfram Neural Net Repository
◼
Computational Language
◼
The
Wolfram Demonstrations Site
: Bringing Life to Science, Education and Research
◼
What is the Wolfram Demonstrations Site?
◼
How students and academics can use the Wolfram Demonstrations Site
◼
How to Contribute to the Wolfram Demonstrations Site
◼
The body of this book will be divided into four main parts. These parts are further divided into various chapters of common interest.
PART 1: DESCRIPTIVE STATISTICS, STATISTICAL DISTRIBUTIONS AND HYPOTHESIS TESTING IN THE WOLFRAM LANGUAGE
This part serves as a primer and/or introduction to the principles of Statistical Modelling in the Wolfram Language. Considering that both the classical (frequentist) statistical and Bayesian statistical models to be discussed have some distributional assumptions about the residuals, it is essential that we establish a foundation for how the principles of mathematical statistics are presented in the Wolfram Language. Starting from the very basics of descriptive statistical tabulation and visualization, this part introduces the reader to principles of exploratory analysis in the Wolfram Language.
We maintain the term Wolfram Language, considering that the reader will also be introduced to Wolfram Alpha (web and notebook version), as a source of curated, computable data and fast exploratory data analysis. The reader will also be introduced to the
Wolfram Data Repository
with the examples used in this part and the rest of the text.
Chapter 1. Exploratory and Descriptive Statistics in the Wolfram Language
◼
Importing and Exporting Data
◼
Symbolic Univariate Graphing and Data Summaries
◼
Symbolic Multivariate Graphing and Data Summaries
◼
Finding Clusters in Data
◼
Chapter 2. Statistical Distributions in the Wolfram Language
◼
Symbolic graphing and data summaries
◼
Probabilities and Expectations
◼
Symbolic Simulation of random variates
◼
Parameter Estimation
◼
Derived and Formula Distributions
◼
Mixture Distributions
◼
Parametric Mixtures
◼
Multivariate Derived Distributions
◼
Copulas
◼
Nonparametric Distributions
◼
Survival Distributions
◼
Chapter 3. Hypothesis Testing in the Wolfram Language
◼
Introduction to Symbolic Hypothesis Testing
◼
Tests of Location
◼
Distributional Goodness-of-Fit
◼
Dependency Tests
◼
Variance Tests
◼
Testing with Estimated Parameters
◼
Time Series Tests
◼
PART 2: INTRODUCTION TO STATISTICAL MODELS IN THE WOLFRAM LANGUAGE
This part presents the core parts of statistical and machine learning, model building and assessment as introduced in this book. Starting with the classical linear regression model, all the way to resampling and cross-validation methods, this chapter introduces the reader to the art of statistical model fitting and assessment in the Wolfram Language. This part also illustrates how the Generalized linear models (Gaussian Families, Binomial Families, Poisson Families, Inverse Gaussian Families and Gamma Families) are fitted in the Wolfram Language. Non-linear models are also presented, with examples from science and economics where non-linearities are ubiquitous.
Resampling and cross validation techniques serve as a bridge from statistical models to reality. Considering potential limitations of drawing more samples from a population, computational intensive algorithms such as the bootstrap and cross-validation have proved practical tool in modern statistical analysis. This part introduces the reader to resampling techniques, as implemented in the Wolfram Language. These will be applied to the various standard linear models, generalized linear models and non-linear models presented. In this part, the reader will also be introduced to the
Wolfram Function Repository
, where
“consumption ready”
Wolfram Language functions for statistical and machine learning models are contributed by a host of Wolfram users.
Chapter 4. Linear Regression Models in the Wolfram Language
◼
A Mathematical Introduction to Statistical Modelling
◼
Fitting Linear Regression Models
◼
Assessment of Linear Regression Models
◼
Bayesian Linear Regression Models
◼
General Discussion on the Limitations of Linear Regression Models
◼
Chapter 5. Generalized Linear Regression models in the Wolfram Language
◼
The Mathematical Statistics of Generalized Linear models
◼
Binomial Families
◼
Gaussian Families
◼
Poisson Families
◼
Inverse Gaussian Families
◼
Gamma Families
◼
Chapter 6. Non-Linear Regression models in the Wolfram Language
◼
The Mathematics and Statistics of Non-Linear Models
◼
Introduction to the Non-Linear Model Fitting Super-Function
◼
Wavelet Smoothing
◼
Cubic Splines
◼
Multi-Nonlinear Models
◼
Practical Examples from Classical Science, Economics, Engineering and Finance
◼
Chapter 7. Resampling Techniques and Cross-Validation in the Wolfram Language
◼
The Mathematics and Statistics of Resampling and Cross-Validation
◼
Validation
◼
Leave-One-Out Cross-Validation
◼
k-fold Cross-Validation
◼
Introduction to the Bootstrap
◼
Model Selection Case Studies: Examples Using
n
Degree Polynomials on Real Data
◼
PART 3: INTRODUCTION TO MACHINE LEARNING IN THE WOLFRAM LANGAUGE
The symbolic architecture of the Wolfram Language, coupled with its robust optimization packages developed for over three decades form a solid substratum for the training of the latest machine learning algorithms. This part introduces the reader to the principles of machine learning in the Wolfram Language, starting with the high level machine learning learning super-functions for prediction (quantitative response) and classification (categorical outcomes). The reader explores a wide array of innovative machine learning techniques for non-linear regression and classification using the Wolfram Language.
In this part, the reader is introduced to the art of training Neural Networks
“from first principles”
. This chapter also showcases the various layers on neural networks, with their function equivalents in the core Wolfram Language. Finally, the reader is also introduced to the
Wolfram Neural Network Repository
where pre-trained, computable neural networks are available for modification and use in the Wolfram Language.
Chapter 8. Introduction to Machine Learning in the Wolfram Language
◼
Supervised vs. Unsupervised Learning, a High Level Introduction
◼
Two Super-Functions for Machine Learning: Classify and Predict
◼
Symbolic Machine Learning Model Assessment for Wolfram Language Super-Functions: Classifier Measurements and Predictor Measurements
◼
Cross-Validation and Super-Functions: Classify and Predict
◼
Chapter 9. Neural Networks in the Wolfram Language
◼
What are Artificial Neural Networks? An In-Depth Tour from Biology to Computational Mathematics, and Finally to Computational Statistics.
◼
Building Blocks of Artificial Neural Networks in the Wolfram Language:
◼
Layers
◼
Chains
◼
Graphs
◼
Initialization
◼
Training of Neural Networks in the Wolfram Language (CPUs and GPUs)
◼
Symbolic Neural Network Model Assessment
◼
Cross-validation and Neural Networks
◼
Using the Neural Network Repository
◼
PART 4: RANDOM PROCESSES IN TIME
A random process models the progression of a system over time, where the evolution is random rather than deterministic. The key point is that observations that are close in time are dependent, and this can be used to model, simulate, and predict the behavior of the process. Random processes are used in a variety of fields including economics, finance, engineering, physics, and biology.
Building on its strong capabilities for distributions, the Wolfram Language provides cohesive and comprehensive random process support. Using a symbolic representation of a process makes it easy to simulate its behavior, estimate parameters from data, and compute state probabilities at different times. There is additional functionality for special classes of random processes such as Markov chains, queues, time series, and stochastic differential equations.
Chapter 10. Time Series Models in the Wolfram Language
◼
Time series refers to a sequence of observations following each other in time, where adjacent observations are correlated. This can be used to model, simulate, and forecast behavior for a system. Time series models are frequently used in fields such as economics, finance, biology, and engineering.
The Wolfram Language provides a full suite of time series functionality, including standard models such as MA, AR, and ARMA, as well as several extensions. Time series models can be simulated, estimated from data, and used to produce forecasts of future behavior.
The following models will be implemented:
Moving Average (MA) Processes
◼
Autoregressive (AR) Processes
◼
Autoregressive Moving-Average (ARMA) Processes
◼
Seasonal Autoregressive Moving-Average (SARMA) Processes
◼
Autoregressive Integrated Moving-Average (ARIMA) Processes
◼
Seasonal Autoregressive Integrated Moving-Average (SARIMA) Processes
◼
Fractional Autoregressive Integrated Moving-Average (FARIMA) Processes
◼
Autoregressive Conditionally Heteroscedastic (ARCH) Processes
◼
Generalized Autoregressive Conditionally Heteroscedastic (GARCH) Processes
◼
Chapter 11. Parametric Stochastic Processes in the Wolfram Language
◼
Parametric random processes are processes specified using a few parameters. They provide standard models for a variety of areas, including finance (interest rates,
…
), insurance (claims process,
…
), and physics (radioactive decay,
…
).
The Wolfram Language provides complete support for working with parametric random processes. The symbolic representation of a process makes it easy to simulate its behavior, estimate parameters from data, and compute state probabilities at different times. There is a full suite of standard properties, including state distributions at different times as well as mean and covariance functions.
Random Walk Process
◼
Bernoulli Process
◼
White Noise Process
◼
Binomial Process
◼
Renewal Process
◼
Compound Renewal Process
◼
Chapter 12. Stochastic Differential Equation Processes in the Wolfram Language
◼
Stochastic differential equations (SDEs) occur where a system described by differential equations is influenced by random noise. Stochastic differential equations are used in finance (interest rate, stock prices,
…
), biology (population, epidemics,
…
), physics (particles in fluids, thermal noise,
…
), and control and signal processing (controller, filtering,
…
).
The Wolfram Language provides common special SDEs specified by a few parameters as well as general Ito and Stratonovich SDEs and systems specified by their differential equations. The symbolic representation of SDE processes allows a uniform way to compute a variety of properties, from simulation and mean and covariance functions to full state distributions at different times.
The following SDE processes will be implemented:
Wiener Process
◼
Ornstein Uhlenbeck Process
◼
Brownian Bridge Process
◼
Geometric Brownian Motion Process
◼
Cox Ingersoll Ross Process
◼
Ito Process
◼
Stratonovich Process
◼
EXAMPLES:
We now turn to some examples from the text. Three examples will be presented - two from Chapter 1 (Exploratory and Descriptive Statistics in the Wolfram Language) and another one from Chapter 8 (Introduction to Machine Learning in the Wolfram Language).
Example 1: Bivariate Exploratory Data Analysis (from Chapter 1)
Consider the Wolfram Language built
-
in Old Faithful geyser data includes durations and waiting times for geyser eruptions.
To obtain the data, we use the function
ExampleData
and specify the class of the data to be Statistics and and name of the data to be OldFaithful.
We use the function
Short
to suppress some of the data and allow some of the observations to be seen.
S
h
o
r
t
[
o
f
d
a
t
a
=
E
x
a
m
p
l
e
D
a
t
a
[
{
"
S
t
a
t
i
s
t
i
c
s
"
,
"
O
l
d
F
a
i
t
h
f
u
l
"
}
]
]
I
n
[
]
:
=
{
{
3
.
6
,
7
9
}
,
{
1
.
8
,
5
4
}
,
{
3
.
3
3
3
,
7
4
}
,
2
6
6
,
{
4
.
4
1
7
,
9
0
}
,
{
1
.
8
1
7
,
4
6
}
,
{
4
.
4
6
7
,
7
4
}
}
O
u
t
[
]
/
/
S
h
o
r
t
=
Let us visualise the data:
L
i
s
t
P
l
o
t
[
o
f
d
a
t
a
]
I
n
[
]
:
=
O
u
t
[
]
=
What are the properties of this data?
E
x
a
m
p
l
e
D
a
t
a
[
{
"
S
t
a
t
i
s
t
i
c
s
"
,
"
O
l
d
F
a
i
t
h
f
u
l
"
}
,
"
P
r
o
p
e
r
t
i
e
s
"
]
I
n
[
]
:
=
{
A
p
p
l
i
c
a
t
i
o
n
A
r
e
a
s
,
C
o
l
u
m
n
D
e
s
c
r
i
p
t
i
o
n
s
,
C
o
l
u
m
n
H
e
a
d
i
n
g
s
,
C
o
l
u
m
n
T
y
p
e
s
,
D
a
t
a
E
l
e
m
e
n
t
s
,
D
a
t
a
T
y
p
e
,
D
e
s
c
r
i
p
t
i
o
n
,
D
i
m
e
n
s
i
o
n
s
,
E
v
e
n
t
D
a
t
a
,
E
v
e
n
t
S
e
r
i
e
s
,
L
o
n
g
D
e
s
c
r
i
p
t
i
o
n
,
N
a
m
e
,
O
b
s
e
r
v
a
t
i
o
n
C
o
u
n
t
,
S
o
u
r
c
e
,
T
i
m
e
S
e
r
i
e
s
}
O
u
t
[
]
=
Let us use the property "LongDescription" to get a detailed description of the data:
E
x
a
m
p
l
e
D
a
t
a
[
{
"
S
t
a
t
i
s
t
i
c
s
"
,
"
O
l
d
F
a
i
t
h
f
u
l
"
}
,
"
L
o
n
g
D
e
s
c
r
i
p
t
i
o
n
"
]
I
n
[
]
:
=
D
a
t
a
o
n
e
r
u
p
t
i
o
n
s
o
f
O
l
d
F
a
i
t
h
f
u
l
g
e
y
s
e
r
,
O
c
t
o
b
e
r
1
9
8
0
.
V
a
r
i
a
b
l
e
s
a
r
e
t
h
e
d
u
r
a
t
i
o
n
i
n
s
e
c
o
n
d
s
o
f
t
h
e
c
u
r
r
e
n
t
e
r
u
p
t
i
o
n
,
a
n
d
t
h
e
t
i
m
e
i
n
m
i
n
u
t
e
s
t
o
t
h
e
n
e
x
t
e
r
u
p
t
i
o
n
.
C
o
l
l
e
c
t
e
d
b
y
v
o
l
u
n
t
e
e
r
s
,
a
n
d
s
u
p
p
l
i
e
d
b
y
t
h
e
Y
e
l
l
o
w
s
t
o
n
e
N
a
t
i
o
n
a
l
P
a
r
k
G
e
o
l
o
g
i
s
t
.
D
a
t
a
w
a
s
n
o
t
c
o
l
l
e
c
t
e
d
b
e
t
w
e
e
n
a
p
p
r
o
x
i
m
a
t
e
l
y
m
i
d
n
i
g
h
t
a
n
d
6
A
M
.
O
u
t
[
]
=
Clearly the data is bivariate. We can use the property "ColumnDescriptions" to know the description of the properties:
E
x
a
m
p
l
e
D
a
t
a
[
{
"
S
t
a
t
i
s
t
i
c
s
"
,
"
O
l
d
F
a
i
t
h
f
u
l
"
}
,
"
C
o
l
u
m
n
D
e
s
c
r
i
p
t
i
o
n
s
"
]
I
n
[
]
:
=
{
E
r
u
p
t
i
o
n
t
i
m
e
i
n
m
i
n
u
t
e
s
,
W
a
i
t
i
n
g
t
i
m
e
t
o
n
e
x
t
e
r
u
p
t
i
o
n
i
n
m
i
n
u
t
e
s
}
O
u
t
[
]
=
There are numerous ways to visualize bivariate data, including pointwise plots, density projections, and histograms based on binning or kernel density estimation. The function
DensityHistogram
plots the density histogram for bivariate data, while the function
Histogram3D
generates a 3D histogram for bivariate data and the function
SmoothHistogram3D
plots the 3D kernel histogram for bivariate data.
Here is a plot of the bivariate density using a density histogram:
D
e
n
s
i
t
y
H
i
s
t
o
g
r
a
m
[
o
f
d
a
t
a
,
2
0
,
"
P
D
F
"
]
I
n
[
]
:
=
O
u
t
[
]
=
We can also view the smooth density histogram, using the function
SmoothDensityHistogram
.
S
m
o
o
t
h
D
e
n
s
i
t
y
H
i
s
t
o
g
r
a
m
[
o
f
d
a
t
a
]
I
n
[
]
:
=
O
u
t
[
]
=
One can see that the data is clustered into two. We will get back to this point.
Let us visualize the densities in a 3D histogram:
H
i
s
t
o
g
r
a
m
3
D
[
o
f
d
a
t
a
,
2
0
,
"
P
D
F
"
,
A
x
e
s
L
a
b
e
l
{
"
D
u
r
a
t
i
o
n
"
,
"
W
a
i
t
i
n
g
T
i
m
e
"
,
"
P
r
o
b
a
b
i
l
i
t
y
"
}
]
I
n
[
]
:
=
O
u
t
[
]
=
Let us draw a smooth histogram:
S
m
o
o
t
h
H
i
s
t
o
g
r
a
m
3
D
[
o
f
d
a
t
a
,
A
x
e
s
L
a
b
e
l
{
"
D
u
r
a
t
i
o
n
"
,
"
W
a
i
t
i
n
g
T
i
m
e
"
,
"
P
r
o
b
a
b
i
l
i
t
y
"
}
]
I
n
[
]
:
=
O
u
t
[
]
=
We now plot the four figures and table them:
G
r
a
p
h
i
c
s
G
r
i
d
[
{
{
H
i
s
t
o
g
r
a
m
3
D
[
o
f
d
a
t
a
,
2
0
,
"
P
D
F
"
]
,
S
m
o
o
t
h
H
i
s
t
o
g
r
a
m
3
D
[
o
f
d
a
t
a
]
}
,
{
D
e
n
s
i
t
y
H
i
s
t
o
g
r
a
m
[
o
f
d
a
t
a
,
2
0
,
"
P
D
F
"
]
,
S
m
o
o
t
h
D
e
n
s
i
t
y
H
i
s
t
o
g
r
a
m
[
o
f
d
a
t
a
]
}
}
,
F
r
a
m
e
-
>
A
l
l
,
I
m
a
g
e
S
i
z
e
L
a
r
g
e
]
I
n
[
]
:
=
O
u
t
[
]
=
Let us calculate the mean, variance, standard deviation, skew and kurtosis of waiting durations and waiting time.
T
a
b
l
e
[
f
[
N
[
o
f
d
a
t
a
]
]
,
{
f
,
{
M
e
a
n
,
M
e
d
i
a
n
,
V
a
r
i
a
n
c
e
,
S
t
a
n
d
a
r
d
D
e
v
i
a
t
i
o
n
,
S
k
e
w
n
e
s
s
,
K
u
r
t
o
s
i
s
}
}
]
I
n
[
]
:
=
{
{
3
.
4
8
7
7
8
,
7
0
.
8
9
7
1
}
,
{
4
.
,
7
6
.
}
,
{
1
.
3
0
2
7
3
,
1
8
4
.
8
2
3
}
,
{
1
.
1
4
1
3
7
,
1
3
.
5
9
5
}
,
{
-
0
.
4
1
5
8
4
1
,
-
0
.
4
1
6
3
1
9
}
,
{
1
.
4
9
9
4
,
1
.
8
5
7
3
7
}
}
O
u
t
[
]
=
We can tabulate the
t
h
1
0
,
t
h
2
0
,
…
,
t
h
9
0
percentiles of the Old and Faithful data, together with the smallest and largest using the function
Table
:
q
u
a
n
t
i
l
e
s
=
T
a
b
l
e
[
Q
u
a
n
t
i
l
e
[
o
f
d
a
t
a
,
p
]
,
{
p
,
0
,
1
,
0
.
1
}
]
I
n
[
]
:
=
{
{
1
.
6
,
4
3
}
,
{
1
.
8
5
,
5
1
}
,
{
2
.
,
5
5
}
,
{
2
.
3
,
6
0
}
,
{
3
.
6
,
7
1
}
,
{
4
.
,
7
6
}
,
{
4
.
1
6
7
,
7
8
}
,
{
4
.
3
6
7
,
8
1
}
,
{
4
.
5
3
3
,
8
3
}
,
{
4
.
7
,
8
6
}
,
{
5
.
1
,
9
6
}
}
O
u
t
[
]
=
Let us plot the percentiles, together with the smallest and largest values:
L
i
s
t
P
l
o
t
[
q
u
a
n
t
i
l
e
s
,
A
x
e
s
L
a
b
e
l
{
"
D
u
r
a
t
i
o
n
"
,
"
W
a
i
t
i
n
g
T
i
m
e
"
}
]
I
n
[
]
:
=
O
u
t
[
]
=
Example 2: Finding Clusters in Bivariate Data (from Chapter 1)
Another look at the Old and Faithful data:
L
i
s
t
P
l
o
t
[
o
f
d
a
t
a
,
A
x
e
s
L
a
b
e
l
{
"
D
u
r
a
t
i
o
n
"
,
"
W
a
i
t
i
n
g
T
i
m
e
"
}
]
I
n
[
]
:
=
O
u
t
[
]
=
One can see that the data seem to fall into two clusters. One cluster seem to be
Let us visualise the first variable - Eruption time in minutes.
h
i
s
t
D
u
r
a
t
i
o
n
=
H
i
s
t
o
g
r
a
m
[
o
f
d
a
t
a
[
[
A
l
l
,
1
]
]
,
3
0
,
"
P
D
F
"
,
A
x
e
s
L
a
b
e
l
{
"
D
u
r
a
t
i
o
n
"
,
"
P
r
o
b
a
b
i
l
i
t
y
"
}
]
I
n
[
]
:
=
O
u
t
[
]
=
We an also visualize the second variable - Waiting time to next eruption in minutes.
h
i
s
t
W
a
i
t
i
n
g
T
i
m
e
=
H
i
s
t
o
g
r
a
m
[
o
f
d
a
t
a
[
[
A
l
l
,
2
]
]
,
3
0
,
"
P
D
F
"
,
A
x
e
s
L
a
b
e
l
{
"
W
a
i
t
i
n
g
T
i
m
e
"
,
"
P
r
o
b
a
b
i
l
i
t
y
"
}
]
I
n
[
]
:
=
O
u
t
[
]
=
We can use the function
FindClusters
to divide the data into clusters.
c
l
u
s
t
e
r
s
=
F
i
n
d
C
l
u
s
t
e
r
s
[
o
f
d
a
t
a
]
;
I
n
[
]
:
=
Let us visualize the clusters. We use the graphics option
PlotStyle
to indicate the clusters using different classes.
L
i
s
t
P
l
o
t
[
c
l
u
s
t
e
r
s
,
P
l
o
t
S
t
y
l
e
{
B
l
u
e
,
R
e
d
}
,
A
x
e
s
L
a
b
e
l
{
"
D
u
r
a
t
i
o
n
"
,
"
W
a
i
t
i
n
g
T
i
m
e
"
}
]
I
n
[
]
:
=
O
u
t
[
]
=
Let us plot a histogram of the Eruption time in minutes for the first cluster, with 15 bins.
h
i
s
t
D
u
r
a
t
i
o
n
1
=
H
i
s
t
o
g
r
a
m
[
c
l
u
s
t
e
r
s
[
[
1
,
A
l
l
,
1
]
]
,
1
5
,
"
P
D
F
"
,
A
x
e
s
L
a
b
e
l
{
"
D
u
r
a
t
i
o
n
"
,
"
P
r
o
b
a
b
i
l
i
t
y
"
}
]
I
n
[
]
:
=
O
u
t
[
]
=
Let us plot a histogram of the Eruption time in minutes for the second cluster, with 15 bins.
h
i
s
t
D
u
r
a
t
i
o
n
2
=
H
i
s
t
o
g
r
a
m
[
c
l
u
s
t
e
r
s
[
[
2
,
A
l
l
,
1
]
]
,
1
5
,
"
P
D
F
"
,
A
x
e
s
L
a
b
e
l
{
"
D
u
r
a
t
i
o
n
"
,
"
P
r
o
b
a
b
i
l
i
t
y
"
}
]
I
n
[
]
:
=
O
u
t
[
]
=
We can also plot a histogram of the Waiting time to next eruption in minutes for the first cluster, with 15 bins.
h
i
s
t
W
a
i
t
i
n
g
T
i
m
e
1
=
H
i
s
t
o
g
r
a
m
[
c
l
u
s
t
e
r
s
[
[
1
,
A
l
l
,
2
]
]
,
1
5
,
"
P
D
F
"
,
A
x
e
s
L
a
b
e
l
{
"
W
a
i
t
i
n
g
T
i
m
e
"
,
"
P
r
o
b
a
b
i
l
i
t
y
"
}
]
I
n
[
]
:
=
O
u
t
[
]
=
Then the plot a histogram of the Waiting time to next eruption in minutes for the second cluster, with 15 bins.
h
i
s
t
W
a
i
t
i
n
g
T
i
m
e
2
=
H
i
s
t
o
g
r
a
m
[
c
l
u
s
t
e
r
s
[
[
2
,
A
l
l
,
2
]
]
,
1
5
,
"
P
D
F
"
,
A
x
e
s
L
a
b
e
l
{
"
W
a
i
t
i
n
g
T
i
m
e
"
,
"
P
r
o
b
a
b
i
l
i
t
y
"
}
]
I
n
[
]
:
=
O
u
t
[
]
=
G
r
a
p
h
i
c
s
G
r
i
d
[
{
{
h
i
s
t
D
u
r
a
t
i
o
n
1
,
h
i
s
t
D
u
r
a
t
i
o
n
2
}
,
{
h
i
s
t
W
a
i
t
i
n
g
T
i
m
e
1
,
h
i
s
t
W
a
i
t
i
n
g
T
i
m
e
2
}
}
,
F
r
a
m
e
-
>
A
l
l
]
I
n
[
]
:
=
O
u
t
[
]
=
We can see the two clusters using 3D histograms:
H
i
s
t
o
g
r
a
m
3
D
[
c
l
u
s
t
e
r
s
,
2
0
,
"
P
D
F
"
,
C
h
a
r
t
S
t
y
l
e
{
L
i
g
h
t
e
r
[
B
l
u
e
]
,
L
i
g
h
t
e
r
[
R
e
d
]
}
]
I
n
[
]
:
=
O
u
t
[
]
=
Let us draw a box and whisker chart of the waiting time to next eruption, for the two clusters.
B
o
x
W
h
i
s
k
e
r
C
h
a
r
t
[
{
c
l
u
s
t
e
r
s
[
[
1
,
A
l
l
,
2
]
]
,
c
l
u
s
t
e
r
s
[
[
2
,
A
l
l
,
2
]
]
}
]
I
n
[
]
:
=
O
u
t
[
]
=
Clearly, the two clusters are apparent.
Let us plot the histograms of the Waiting time, together - using
PairedHistogram
:
P
a
i
r
e
d
H
i
s
t
o
g
r
a
m
[
c
l
u
s
t
e
r
s
[
[
1
,
A
l
l
,
2
]
]
,
c
l
u
s
t
e
r
s
[
[
2
,
A
l
l
,
2
]
]
]
I
n
[
]
:
=
O
u
t
[
]
=
P
a
i
r
e
d
S
m
o
o
t
h
H
i
s
t
o
g
r
a
m
[
c
l
u
s
t
e
r
s
[
[
1
,
A
l
l
,
2
]
]
,
c
l
u
s
t
e
r
s
[
[
2
,
A
l
l
,
2
]
]
]
I
n
[
]
:
=
O
u
t
[
]
=
Cumulative distribution function of the Waiting time:
P
a
i
r
e
d
H
i
s
t
o
g
r
a
m
[
c
l
u
s
t
e
r
s
[
[
1
,
A
l
l
,
2
]
]
,
c
l
u
s
t
e
r
s
[
[
2
,
A
l
l
,
2
]
]
,
A
u
t
o
m
a
t
i
c
,
"
C
D
F
"
]
I
n
[
]
:
=
O
u
t
[
]
=
Example 3: Prediction (from Chapter 8)
When the response variable is continuous in nature and is measured on a ratio scale, the problem of assigning values to the dependent variable, based on the observed values of the independent variable can be viewed as a prediction problem. Estimating one's income, based on their level of study, geographical region and other independent variable can be seen as prediction problem. Likewise, calculating the best estimate for tomorrow's rainfall, based on the today's temperature, humidity and so on can also be viewed as a regression problem.
Various statistical and/or machine learning techniques can be used, for classification problems. These include:
Liner Regression
◼
Neural Networks
◼
Gradient Boosted Trees
◼
Random Forest
◼
Decision Tree
◼
Gaussian Process
◼
Prediction problems consist of numerical responses.
Consider the
Boston Homes
data from the Wolfram Data Repository. This data consists of home values for 506 Boston suburbs with potential influential factors (such as the crime rate, number of rooms, distance to employment centers, etc.).
b
o
s
t
o
n
=
R
e
s
o
u
r
c
e
D
a
t
a
[
"
S
a
m
p
l
e
D
a
t
a
:
B
o
s
t
o
n
H
o
m
e
s
"
]
;
I
n
[
]
:
=
How big is this data?
L
e
n
g
t
h
[
b
o
s
t
o
n
]
I
n
[
]
:
=
5
0
6
O
u
t
[
]
=
Let us visualise a random sample of 10 entries of this data:
S
e
e
d
R
a
n
d
o
m
[
1
2
3
4
]
;
R
a
n
d
o
m
S
a
m
p
l
e
[
b
o
s
t
o
n
,
1
0
]
I
n
[
]
:
=
O
u
t
[
]
=
We can also look at the data as a association:
b
o
s
t
o
n
[
[
1
;
;
5
]
]
/
/
N
o
r
m
a
l
I
n
[
]
:
=
C
R
I
M
0
.
0
0
6
3
2
,
Z
N
1
8
,
I
N
D
U
S
2
.
3
1
,
C
H
A
S
t
r
a
c
t
d
o
e
s
n
o
t
b
o
u
n
d
C
h
a
r
l
e
s
r
i
v
e
r
,
N
O
X
0
.
5
3
8
p
p
m
,
R
M
6
.
5
7
5
,
A
G
E
6
5
.
2
,
D
I
S
4
.
0
9
,
R
A
D
1
,
T
A
X
2
9
6
,
P
T
R
A
T
I
O
1
5
.
3
,
B
L
A
C
K
3
9
6
.
9
,
L
S
T
A
T
4
.
9
8
%
,
M
E
D
V
2
4
,
C
R
I
M
0
.
0
2
7
3
1
,
Z
N
0
,
I
N
D
U
S
7
.
0
7
,
C
H
A
S
t
r
a
c
t
d
o
e
s
n
o
t
b
o
u
n
d
C
h
a
r
l
e
s
r
i
v
e
r
,
N
O
X
0
.
4
6
9
p
p
m
,
R
M
6
.
4
2
1
,
A
G
E
7
8
.
9
,
D
I
S
4
.
9
6
7
1
,
R
A
D
2
,
T
A
X
2
4
2
,
P
T
R
A
T
I
O
1
7
.
8
,
B
L
A
C
K
3
9
6
.
9
,
L
S
T
A
T
9
.
1
4
%
,
M
E
D
V
2
1
.
6
,
C
R
I
M
0
.
0
2
7
2
9
,
Z
N
0
,
I
N
D
U
S
7
.
0
7
,
C
H
A
S
t
r
a
c
t
d
o
e
s
n
o
t
b
o
u
n
d
C
h
a
r
l
e
s
r
i
v
e
r
,
N
O
X
0
.
4
6
9
p
p
m
,
R
M
7
.
1
8
5
,
A
G
E
6
1
.
1
,
D
I
S
4
.
9
6
7
1
,
R
A
D
2
,
T
A
X
2
4
2
,
P
T
R
A
T
I
O
1
7
.
8
,
B
L
A
C
K
3
9
2
.
8
3
,
L
S
T
A
T
4
.
0
3
%
,
M
E
D
V
3
4
.
7
,
C
R
I
M
0
.
0
3
2
3
7
,
Z
N
0
,
I
N
D
U
S
2
.
1
8
,
C
H
A
S
t
r
a
c
t
d
o
e
s
n
o
t
b
o
u
n
d
C
h
a
r
l
e
s
r
i
v
e
r
,
N
O
X
0
.
4
5
8
p
p
m
,
R
M
6
.
9
9
8
,
A
G
E
4
5
.
8
,
D
I
S
6
.
0
6
2
2
,
R
A
D
3
,
T
A
X
2
2
2
,
P
T
R
A
T
I
O
1
8
.
7
,
B
L
A
C
K
3
9
4
.
6
3
,
L
S
T
A
T
2
.
9
4
%
,
M
E
D
V
3
3
.
4
,
C
R
I
M
0
.
0
6
9
0
5
,
Z
N
0
,
I
N
D
U
S
2
.
1
8
,
C
H
A
S
t
r
a
c
t
d
o
e
s
n
o
t
b
o
u
n
d
C
h
a
r
l
e
s
r
i
v
e
r
,
N
O
X
0
.
4
5
8
p
p
m
,
R
M
7
.
1
4
7
,
A
G
E
5
4
.
2
,
D
I
S
6
.
0
6
2
2
,
R
A
D
3
,
T
A
X
2
2
2
,
P
T
R
A
T
I
O
1
8
.
7
,
B
L
A
C
K
3
9
6
.
9
,
L
S
T
A
T
5
.
3
3
%
,
M
E
D
V
3
6
.
2
O
u
t
[
]
=
The test and training sets were obtained by creating a random sample of 30% of the data for the test set, using the rest for training.
Let us assign a variable name to the training set:
b
o
s
t
o
n
T
r
a
i
n
=
R
e
s
o
u
r
c
e
D
a
t
a
[
"
S
a
m
p
l
e
D
a
t
a
:
B
o
s
t
o
n
H
o
m
e
s
"
,
"
T
r
a
i
n
i
n
g
D
a
t
a
"
]
;
I
n
[
]
:
=
How big is this training data?
L
e
n
g
t
h
[
b
o
s
t
o
n
T
r
a
i
n
]
I
n
[
]
:
=
3
3
8
O
u
t
[
]
=
Let us also assign a variable name to the test data:
b
o
s
t
o
n
T
e
s
t
=
R
e
s
o
u
r
c
e
D
a
t
a
[
"
S
a
m
p
l
e
D
a
t
a
:
B
o
s
t
o
n
H
o
m
e
s
"
,
"
T
e
s
t
D
a
t
a
"
]
;
I
n
[
]
:
=
What is the length of the test data?
L
e
n
g
t
h
[
b
o
s
t
o
n
T
e
s
t
]
I
n
[
]
:
=
1
6
8
O
u
t
[
]
=
The idea is to train a predictor to predict the median value of owner-occupied homes in $1000s, using the predictor variables given.
The function trains a predictor, for numerical outcomes:
b
o
s
t
o
n
P
r
e
d
i
c
t
=
P
r
e
d
i
c
t
[
b
o
s
t
o
n
T
r
a
i
n
]
I
n
[
]
:
=
P
r
e
d
i
c
t
o
r
F
u
n
c
t
i
o
n
I
n
p
u
t
t
y
p
e
:
M
i
x
e
d
(
n
u
m
b
e
r
:
1
3
)
M
e
t
h
o
d
:
L
i
n
e
a
r
R
e
g
r
e
s
s
i
o
n
O
u
t
[
]
=
We have produced a predictor object. The next thing is to measure how good this predictor is for the data it has not seen.
The function
PredictorMeasurements
does exactly this.
b
o
s
t
o
n
P
r
e
d
i
c
t
o
r
M
e
a
s
u
r
e
m
e
n
t
s
=
P
r
e
d
i
c
t
o
r
M
e
a
s
u
r
e
m
e
n
t
s
[
b
o
s
t
o
n
P
r
e
d
i
c
t
,
b
o
s
t
o
n
T
e
s
t
]
I
n
[
]
:
=
P
r
e
d
i
c
t
o
r
M
e
a
s
u
r
e
m
e
n
t
s
O
b
j
e
c
t
P
r
e
d
i
c
t
o
r
:
L
i
n
e
a
r
R
e
g
r
e
s
s
i
o
n
N
u
m
b
e
r
o
f
t
e
s
t
e
x
a
m
p
l
e
s
:
1
6
8
O
u
t
[
]
=
We have produced a predictor measurement object.
What are the properties associated with this object?
p
r
o
p
e
r
t
i
e
s
B
o
s
t
o
n
=
b
o
s
t
o
n
P
r
e
d
i
c
t
o
r
M
e
a
s
u
r
e
m
e
n
t
s
[
"
P
r
o
p
e
r
t
i
e
s
"
]
I
n
[
]
:
=
{
B
a
t
c
h
E
v
a
l
u
a
t
i
o
n
T
i
m
e
,
B
e
s
t
P
r
e
d
i
c
t
e
d
E
x
a
m
p
l
e
s
,
C
o
m
p
a
r
i
s
o
n
P
l
o
t
,
E
v
a
l
u
a
t
i
o
n
T
i
m
e
,
E
x
a
m
p
l
e
s
,
F
r
a
c
t
i
o
n
V
a
r
i
a
n
c
e
U
n
e
x
p
l
a
i
n
e
d
,
G
e
o
m
e
t
r
i
c
M
e
a
n
P
r
o
b
a
b
i
l
i
t
y
D
e
n
s
i
t
y
,
L
e
a
s
t
C
e
r
t
a
i
n
E
x
a
m
p
l
e
s
,
L
i
k
e
l
i
h
o
o
d
,
L
o
g
L
i
k
e
l
i
h
o
o
d
,
M
e
a
n
C
r
o
s
s
E
n
t
r
o
p
y
,
M
e
a
n
D
e
v
i
a
t
i
o
n
,
M
e
a
n
S
q
u
a
r
e
,
M
o
s
t
C
e
r
t
a
i
n
E
x
a
m
p
l
e
s
,
P
e
r
p
l
e
x
i
t
y
,
P
r
e
d
i
c
t
o
r
F
u
n
c
t
i
o
n
,
P
r
o
b
a
b
i
l
i
t
y
D
e
n
s
i
t
i
e
s
,
P
r
o
b
a
b
i
l
i
t
y
D
e
n
s
i
t
y
H
i
s
t
o
g
r
a
m
,
P
r
o
p
e
r
t
i
e
s
,
R
e
j
e
c
t
i
o
n
R
a
t
e
,
R
e
p
o
r
t
,
R
e
s
i
d
u
a
l
H
i
s
t
o
g
r
a
m
,
R
e
s
i
d
u
a
l
P
l
o
t
,
R
e
s
i
d
u
a
l
s
,
R
S
q
u
a
r
e
d
,
S
t
a
n
d
a
r
d
D
e
v
i
a
t
i
o
n
,
S
t
a
n
d
a
r
d
D
e
v
i
a
t
i
o
n
B
a
s
e
l
i
n
e
,
T
o
t
a
l
S
q
u
a
r
e
,
W
o
r
s
t
P
r
e
d
i
c
t
e
d
E
x
a
m
p
l
e
s
}
O
u
t
[
]
=
Let us get the comparison plot:
b
o
s
t
o
n
P
r
e
d
i
c
t
o
r
M
e
a
s
u
r
e
m
e
n
t
s
[
"
C
o
m
p
a
r
i
s
o
n
P
l
o
t
"
]
I
n
[
]
:
=
O
u
t
[
]
=
We could also tabulate all the properties, as a Dataset:
D
a
t
a
s
e
t
[
A
s
s
o
c
i
a
t
i
o
n
M
a
p
[
b
o
s
t
o
n
P
r
e
d
i
c
t
o
r
M
e
a
s
u
r
e
m
e
n
t
s
[
#
]
&
,
b
o
s
t
o
n
P
r
e
d
i
c
t
o
r
M
e
a
s
u
r
e
m
e
n
t
s
[
"
P
r
o
p
e
r
t
i
e
s
"
]
]
]
I
n
[
]
:
=
O
u
t
[
]
=
We can specify the method, as the second argument of the function Predict.
For example, we can specify the method to be “NeuralNetwork”.
P
r
e
d
i
c
t
[
b
o
s
t
o
n
T
r
a
i
n
,
M
e
t
h
o
d
-
>
"
N
e
u
r
a
l
N
e
t
w
o
r
k
"
]
I
n
[
]
:
=
P
r
e
d
i
c
t
o
r
F
u
n
c
t
i
o
n
I
n
p
u
t
t
y
p
e
:
M
i
x
e
d
(
n
u
m
b
e
r
:
1
3
)
M
e
t
h
o
d
:
N
e
u
r
a
l
N
e
t
w
o
r
k
O
u
t
[
]
=
We can obtain predictor measurements associated with a property such as "RSquared" when a predictor is trained using the train set and evaluated on the test data in calculation:
P
r
e
d
i
c
t
o
r
M
e
a
s
u
r
e
m
e
n
t
s
[
P
r
e
d
i
c
t
[
b
o
s
t
o
n
T
r
a
i
n
,
M
e
t
h
o
d
-
>
"
N
e
u
r
a
l
N
e
t
w
o
r
k
"
]
,
b
o
s
t
o
n
T
e
s
t
,
"
R
S
q
u
a
r
e
d
"
]
I
n
[
]
:
=
0
.
8
1
0
9
7
4
O
u
t
[
]
=
Let us now collect all the available prediction methods in a list and name the list:
p
r
e
d
i
c
t
M
e
t
h
o
d
s
=
{
"
D
e
c
i
s
i
o
n
T
r
e
e
"
,
"
G
r
a
d
i
e
n
t
B
o
o
s
t
e
d
T
r
e
e
s
"
,
"
L
i
n
e
a
r
R
e
g
r
e
s
s
i
o
n
"
,
"
N
e
a
r
e
s
t
N
e
i
g
h
b
o
r
s
"
,
"
N
e
u
r
a
l
N
e
t
w
o
r
k
"
,
"
R
a
n
d
o
m
F
o
r
e
s
t
"
,
"
G
a
u
s
s
i
a
n
P
r
o
c
e
s
s
"
}
;
I
n
[
]
:
=
Using AssociationMap, we can create an association containing list a certain property for all the available methods:
D
a
t
a
s
e
t
[
A
s
s
o
c
i
a
t
i
o
n
M
a
p
[
P
r
e
d
i
c
t
o
r
M
e
a
s
u
r
e
m
e
n
t
s
[
P
r
e
d
i
c
t
[
b
o
s
t
o
n
T
r
a
i
n
,
M
e
t
h
o
d
#
]
,
b
o
s
t
o
n
T
e
s
t
,
"
R
S
q
u
a
r
e
d
"
]
&
,
p
r
e
d
i
c
t
M
e
t
h
o
d
s
]
]
I
n
[
]
:
=
O
u
t
[
]
=
We can also look at the comparison plots for all the methods:
c
o
m
p
a
r
i
s
o
n
P
l
o
t
=
D
a
t
a
s
e
t
[
A
s
s
o
c
i
a
t
i
o
n
M
a
p
[
P
r
e
d
i
c
t
o
r
M
e
a
s
u
r
e
m
e
n
t
s
[
P
r
e
d
i
c
t
[
b
o
s
t
o
n
T
r
a
i
n
,
M
e
t
h
o
d
#
]
,
b
o
s
t
o
n
T
e
s
t
,
"
C
o
m
p
a
r
i
s
o
n
P
l
o
t
"
]
&
,
p
r
e
d
i
c
t
M
e
t
h
o
d
s
]
]
;
N
o
r
m
a
l
@
c
o
m
p
a
r
i
s
o
n
P
l
o
t
I
n
[
]
:
=
O
u
t
[
]
=
We can also view the comparison plots individually:
Comparison Plot: Decision Tree
◼
c
o
m
p
a
r
i
s
o
n
P
l
o
t
[
"
D
e
c
i
s
i
o
n
T
r
e
e
"
]
I
n
[
]
:
=
O
u
t
[
]
=
Comparison Plot: Gradient Boosted Trees
◼
c
o
m
p
a
r
i
s
o
n
P
l
o
t
[
"
G
r
a
d
i
e
n
t
B
o
o
s
t
e
d
T
r
e
e
s
"
]
I
n
[
]
:
=
O
u
t
[
]
=
c
o
m
p
a
r
i
s
o
n
P
l
o
t
[
"
L
i
n
e
a
r
R
e
g
r
e
s
s
i
o
n
"
]
I
n
[
]
:
=
O
u
t
[
]
=
Comparison Plot: Linear Regression
◼
c
o
m
p
a
r
i
s
o
n
P
l
o
t
[
"
L
i
n
e
a
r
R
e
g
r
e
s
s
i
o
n
"
]
I
n
[
]
:
=
O
u
t
[
]
=
Comparison Plot: Nearest Neighbors
◼
c
o
m
p
a
r
i
s
o
n
P
l
o
t
[
"
N
e
a
r
e
s
t
N
e
i
g
h
b
o
r
s
"
]
I
n
[
]
:
=
O
u
t
[
]
=
Comparison Plot: Neural Network
◼
c
o
m
p
a
r
i
s
o
n
P
l
o
t
[
"
N
e
u
r
a
l
N
e
t
w
o
r
k
"
]
I
n
[
]
:
=
O
u
t
[
]
=
Comparison Plot : Random Forest
◼
c
o
m
p
a
r
i
s
o
n
P
l
o
t
[
"
R
a
n
d
o
m
F
o
r
e
s
t
"
]
I
n
[
]
:
=
O
u
t
[
]
=