Message Boards Message Boards


Bayesian linear regression in Mathematica

Posted 9 days ago
0 Replies
4 Total Likes

Recently I updated my Bayesian inference reporitory with a new function called BayesianLinearRegression to provide a Bayesian alternative to Mathematica's LinearModelFit. I also submitted the code for this function to the Wolfram function repository to make it easier to access, but as of the time of writing the function hasn't gone up yet.

The example notebook in the repository provides several examples of how the function can be used, some of which I will reproduce below. Please refer to the GitHub file (which is displayed on the homepage of the repository) for instructions about installing the BayesianInference package. The code is in this file, in case you're interested in taking a look under the hood.

Fitting polynomials

First generate some test data:

data = RandomVariate[
  MultinormalDistribution[{{1, 0.7}, {0.7, 1}}],

enter image description here

The usage of BayesianLinearRegression is similar to that of LinearModelFit with one significant difference: I decided that the Rule-based data specification used by Predict and NetTrain is often more convenient than the matrix input used by the Fit family, so I decided that both types of data specifications should work. The main reason for this is that BayesianLinearRegression also supports regression of vector outputs (an example of that is given in the repository example notebook), in which case the Rule-based format is often easier to understand.

When you fit a model, BayesianLinearRegression returns an Association containing all relevant information about the fit. Fit a model of the form y == a + b x + randomness where randomness is distributed as NormalDistribution[0, sigma]. Here, a, b and sigma are unknowns that need to be fitted to the data:

In[20]:= Clear[x];
model =  BayesianLinearRegression[data, {1, x}, x];

Below is a more detailed description of the information return by BayesianLinearRegression, but let's first concentrate on the main fit result, which is the posterior predictive distribution :

In[61]:= model["Posterior", "PredictiveDistribution"]

Out[61]= StudentTDistribution[-0.267818 + 0.986121 x, 
 0.726105 Sqrt[1.05016 + 0.00692863 x + 0.0635094 x^2], 

As you can see, the result is returned is an x-dependent StudentTDistribution. Visualize the predictions:

  predictiveDist = model["Posterior", "PredictiveDistribution"],
  bands = {95, 50, 5}
   Evaluate@InverseCDF[predictiveDist, bands/100],
   {x, -4, 4}, Filling -> {1 -> {2}, 3 -> {2}}, PlotLegends -> bands
  PlotRange -> All

enter image description here

In the BayesianInference package on GitHub, I included a function called regressionPlot1D which makes it a bit easier to make this plot:

 regressionPlot1D[model["Posterior", "PredictiveDistribution"], {x, -4, 4}, {95, 50, 5}],
(* same result*)

Model comparison and model mixing

Often you'll want to compare multiple models against a data set to see which one works best. For example, here is some data where it's a bit unclear if you should use a first or second order model:

data = {{-1.5`,-1.375`},{-1.34375`,-2.375`},{1.5`,0.21875`},{1.03125`,0.6875`},{-0.5`,-0.59375`},

enter image description here

Fit the data with polynomials up to fourth degree; rank the log-evidences and inspect them. As you can see, the first and second order fits are almost equally likely:

In[152]:= models = AssociationMap[
   BayesianLinearRegression[Rule @@@ data, x^Range[0, #], x] &,
   Range[0, 4]
ReverseSort@models[[All, "LogEvidence"]]

Out[153]= <|1 -> -30.0072, 2 -> -30.1774, 3 -> -34.4292, 4 -> -38.7037, 0 -> -38.787|>

Show the prediction bands:

 regressionPlot1D[ models[1, "Posterior", "PredictiveDistribution"], {x, -3, 3}],
 regressionPlot1D[models[2, "Posterior", "PredictiveDistribution"], {x, -3, 3},  PlotStyle -> Dashed, PlotLegends -> None],

Comparison of the first (solid) and second order (dashed) prediction bands

Instead of picking just one of the models, we can define a mixture over all of them. First calculate the weights for each fit:

In[94]:= weights = Normalize[
   (* subtract the max to reduce rounding error *)
   Exp[models[[All, "LogEvidence"]] - Max[models[[All, "LogEvidence"]]]],

Out[95]= <|1 -> 0.516002, 2 -> 0.47702, 3 -> 0.0068122, 4 -> 0.0000982526, 
 0 -> 0.0000667854|>

Define a mixture of posterior predictive distributions and show the predictions:

In[96]:= mixDist = MixtureDistribution[
   Values@models[[All, "Posterior", "PredictiveDistribution"]]
 regressionPlot1D[mixDist, {x, -3, 3}],

Prediction bands of a mixture of polynomial models up to degree 4

As you can see, if one model is not clearly the best, you can just split the difference between them.

Another interesting thing you can do with BayesianLinearRegression is Bayesian updating. Basically, this means that if you've fitted some data in the past and are now finding new data, you can incorporate the new data by updating your fit without having to re-fit all of the data (old and new) in one go. An example of this is given in the example code notebook on GitHub (in the section about the "PriorParameters" options of the function).

Detailed explanation about the returned values

BayesianLinearRegression returns an Association with the following keys:


Out[21]= {"LogEvidence", "PriorParameters", "PosteriorParameters", "Posterior", "Prior", "Basis", "IndependentVariables"}
  • "LogEvidence": In a Bayesian setting, the evidence (also called marginal likelihood) measures how well the model fits the data (with a higher evidence indicating a better fit). The evidence has the virtue that it naturally penalizes models for their complexity and therefore does not suffer from over-fitting in the way that measures like the sum-of-squares or likelihood do.
  • "Basis", "IndependentVariables": Simply the basis functions and independent variable specified by the user.
  • "Posterior", "Prior": These two keys each hold an association with 4 distributions:
  1. "PredictiveDistribution": A distribution that depends on the independent variables (x in the example above). By filling in a value for x, you get a distribution that tells you where you could expect to find future y values. This distribution accounts for all relevant uncertainties in the model: model variance caused by the term randomness; uncertainty in the values of a and b; and uncertainty in sigma.
  2. "UnderlyingValueDistribution": Similar to "PredictiveDistribution", but this distribution give the possible values of a + b x without the randomness error term.
  3. "RegressionCoefficientDistribution": The join distribution over a and b.
  4. "ErrorDistribution": The distribution of the variance sigma^2.
  • "PriorParameters", "PosteriorParameters": These parameters are not immediately important most of the time, but they contain all of the relevant information about the fit.


The formulas underlying BayesianLinearRegression are based mainly on the following Wikipedia articles. The names of the hyper parameters returned by the function are based on the article about multivariate linear regression.

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract