# Bayesian linear regression in Mathematica

Posted 1 year ago
4583 Views
|
8 Replies
|
18 Total Likes
|

Recently I updated my Bayesian inference reporitory with a new function called BayesianLinearRegression to provide a Bayesian alternative to Mathematica's LinearModelFit. I also submitted the code for this function to the Wolfram function repository to make it easier to access, so the function can also be used with ResourceFunction["BayesianLinearRegression"].

The example notebook in the repository provides several examples of how the function can be used, some of which I will reproduce below. Please refer to the GitHub README.md file (which is displayed on the homepage of the repository) for instructions about installing the BayesianInference package. The code is in this file, in case you're interested in taking a look under the hood.

## Fitting polynomials

First generate some test data:

data = RandomVariate[
MultinormalDistribution[{{1, 0.7}, {0.7, 1}}],
20
];
ListPlot[data] The usage of BayesianLinearRegression is similar to that of LinearModelFit with one significant difference: I decided that the Rule-based data specification used by Predict and NetTrain is often more convenient than the matrix input used by the Fit family, so I decided that both types of data specifications should work. The main reason for this is that BayesianLinearRegression also supports regression of vector outputs (an example of that is given in the repository example notebook), in which case the Rule-based format is often easier to understand.

When you fit a model, BayesianLinearRegression returns an Association containing all relevant information about the fit. Fit a model of the form y == a + b x + randomness where randomness is distributed as NormalDistribution[0, sigma]. Here, a, b and sigma are unknowns that need to be fitted to the data:

In:= Clear[x];
model =  BayesianLinearRegression[data, {1, x}, x];


Below is a more detailed description of the information return by BayesianLinearRegression, but let's first concentrate on the main fit result, which is the posterior predictive distribution :

In:= model["Posterior", "PredictiveDistribution"]

Out= StudentTDistribution[-0.267818 + 0.986121 x,
0.726105 Sqrt[1.05016 + 0.00692863 x + 0.0635094 x^2],
2001/100
]


As you can see, the result is returned is an x-dependent StudentTDistribution. Visualize the predictions:

With[{
predictiveDist = model["Posterior", "PredictiveDistribution"],
bands = {95, 50, 5}
},
Show[
Plot[
Evaluate@InverseCDF[predictiveDist, bands/100],
{x, -4, 4}, Filling -> {1 -> {2}, 3 -> {2}}, PlotLegends -> bands
],
ListPlot[data],
PlotRange -> All
]
] In the BayesianInference package on GitHub, I included a function called regressionPlot1D which makes it a bit easier to make this plot:

Show[
regressionPlot1D[model["Posterior", "PredictiveDistribution"], {x, -4, 4}, {95, 50, 5}],
ListPlot[data]
]
(* same result*)


## Model comparison and model mixing

Often you'll want to compare multiple models against a data set to see which one works best. For example, here is some data where it's a bit unclear if you should use a first or second order model:

data = {{-1.5,-1.375},{-1.34375,-2.375},{1.5,0.21875},{1.03125,0.6875},{-0.5,-0.59375},
{-1.875,-2.59375},{1.625,1.1875},{-2.0625,-1.875},{1.0625,0.5},{-0.4375,-0.28125},{-0.75,-0.75},{2.125,0.375},{0.4375,0.6875},{-1.3125,-0.75},{-1.125,-0.21875},
{0.625,0.40625},{-0.25,0.59375},{-1.875,-1.625},{-1.,-0.8125},{0.4375,-0.09375}}
ListPlot[data] Fit the data with polynomials up to fourth degree; rank the log-evidences and inspect them. As you can see, the first and second order fits are almost equally likely:

In:= models = AssociationMap[
BayesianLinearRegression[Rule @@@ data, x^Range[0, #], x] &,
Range[0, 4]
];
ReverseSort@models[[All, "LogEvidence"]]

Out= <|1 -> -30.0072, 2 -> -30.1774, 3 -> -34.4292, 4 -> -38.7037, 0 -> -38.787|>


Show the prediction bands:

Show[
regressionPlot1D[ models[1, "Posterior", "PredictiveDistribution"], {x, -3, 3}],
regressionPlot1D[models[2, "Posterior", "PredictiveDistribution"], {x, -3, 3},  PlotStyle -> Dashed, PlotLegends -> None],
ListPlot[data]
] Instead of picking just one of the models, we can define a mixture over all of them. First calculate the weights for each fit:

In:= weights = Normalize[
(* subtract the max to reduce rounding error *)
Exp[models[[All, "LogEvidence"]] - Max[models[[All, "LogEvidence"]]]],
Total
];
ReverseSort[weights]

Out= <|1 -> 0.516002, 2 -> 0.47702, 3 -> 0.0068122, 4 -> 0.0000982526,
0 -> 0.0000667854|>


Define a mixture of posterior predictive distributions and show the predictions:

In:= mixDist = MixtureDistribution[
Values[weights],
Values@models[[All, "Posterior", "PredictiveDistribution"]]
];
Show[
regressionPlot1D[mixDist, {x, -3, 3}],
ListPlot[data]
] As you can see, if one model is not clearly the best, you can just split the difference between them.

Another interesting thing you can do with BayesianLinearRegression is Bayesian updating. Basically, this means that if you've fitted some data in the past and are now finding new data, you can incorporate the new data by updating your fit without having to re-fit all of the data (old and new) in one go. An example of this is given in the example code notebook on GitHub (in the section about the "PriorParameters" options of the function).

## Detailed explanation about the returned values

BayesianLinearRegression returns an Association with the following keys:

Keys[model]

Out= {"LogEvidence", "PriorParameters", "PosteriorParameters", "Posterior", "Prior", "Basis", "IndependentVariables"}

• "LogEvidence": In a Bayesian setting, the evidence (also called marginal likelihood) measures how well the model fits the data (with a higher evidence indicating a better fit). The evidence has the virtue that it naturally penalizes models for their complexity and therefore does not suffer from over-fitting in the way that measures like the sum-of-squares or likelihood do.
• "Basis", "IndependentVariables": Simply the basis functions and independent variable specified by the user.
• "Posterior", "Prior": These two keys each hold an association with 4 distributions:
1. "PredictiveDistribution": A distribution that depends on the independent variables (x in the example above). By filling in a value for x, you get a distribution that tells you where you could expect to find future y values. This distribution accounts for all relevant uncertainties in the model: model variance caused by the term randomness; uncertainty in the values of a and b; and uncertainty in sigma.
2. "UnderlyingValueDistribution": Similar to "PredictiveDistribution", but this distribution give the possible values of a + b x without the randomness error term.
3. "RegressionCoefficientDistribution": The join distribution over a and b.
4. "ErrorDistribution": The distribution of the variance sigma^2.
• "PriorParameters", "PosteriorParameters": These parameters are not immediately important most of the time, but they contain all of the relevant information about the fit.

## Sources

The formulas underlying BayesianLinearRegression are based mainly on the following Wikipedia articles. The names of the hyper parameters returned by the function are based on the article about multivariate linear regression. Answer
8 Replies
Sort By:
Posted 1 year ago - Congratulations! This post is now featured in our Staff Pick column as distinguished by a badge on your profile of a Featured Contributor! Thank you, keep it coming, and consider contributing your work to the The Notebook Archive! Answer
Posted 1 year ago
 Hi SjoerdThis is all really interesting material, thanks for posting it. Working through the BayesianInference examples, I ran into a problem with "Logistic classification with multi dimensional input and multiple classes" examples, the defineInferenceProblem evaluation shows $Failed for "GeneratingDistribution". I tried to figure out what might be amiss but without success - any suggestions?I am using the package updated in July and the corresponding examples notebook running on version 12.0-Steve Answer Posted 1 year ago  Hi Steve,Yes, I can see what you ran into. I will admit that that particular example isn't worked out all that well. I more or less worked that out for myself and plonked it in for future reference. I assume you're referring to the fact that the tooltip of the inferenceObject shows the "GeneratingDistribution" key as $Failed? From what I can tell, that's just a bug in the code that generates the tooltip: if you evaluate obj["GeneratingDistribution"] it should still return the ProbabilityDistribution that was used to in the construction of the object. I'll try and patch it out in the next push to master.It's continuing work in progress, though I my time and attention span to work on it is not super consistent. Glad to hear that you're giving it a try, though! Answer
Posted 1 year ago
 Hi SjoerdI was actually working backwards from errors thrown by parallelNestedSampling (CholeskyDecomposition error messages) and saw that there seemed to be an issue with the generating distribution from the tooltip message of the inference object. I'd looked at the distribution and couldn't see anything wrong with it (although I did wonder how the various symbols would be resolved by the sampling function), I had a quick look at the source code but didn't get anywhere (there's a lot of code).In any case I've been finding the examples you've provided very helpful in trying to gain a better understanding of the analysis reasoning. Look forward to the next update once you've had a chance to work on it-Steve Answer
Posted 1 year ago
 Yes, I saw those errors as well. I don't actually know where they come from; probably some part of the internal MCMC sampler I'm using. It may be a fairly harmless thing, but I don't really know to be honest. The thing with MCMC and related methods is that things hardly ever work right out of the box without tweaking some options and settings. If you let the sampler run for a while (a couple of minutes or so) it should return a result. Answer
Posted 1 year ago
 Sjoerd,Very happy to have stumbled across this as it seems to be the basis for exactly what I needed. I am wondering if it is possible to do this with multivariate polynomial regression, i.e. would something along the following lines work: BayesianLinearRegression[data,{1,x1,x2,x1^2,x1*x2,x2^2},{x1,x2}] Answer
Posted 1 year ago
 Hi Daniel, Yes, that should be not problem at all. data should be either an n by 3 matrix or a list of rules of the form {{x11, x21} -> y1, ... }. From there it should work without issue, though you may need to re-scale the data if your xi or yi are very large or very small.Oh, and before I forget: the function is now also available as a ResourceFunction, so you don't need to install the Github package if you don't want to. Just use ResourceFunction["BayesianLinearRegression"] instead of BayesianLinearRegression. Answer
Posted 1 year ago
 Thanks for such a prompt reply, I look forward to giving it a try now. Answer