Group Abstract Group Abstract

Message Boards Message Boards

Neural network regression with error bars

POSTED BY: Sjoerd Smit
9 Replies
POSTED BY: Sjoerd Smit

enter image description here - Congratulations! This post is now a Staff Pick as distinguished by a badge on your profile! Thank you, keep it coming!

POSTED BY: EDITORIAL BOARD
Attachments:
POSTED BY: Sjoerd Smit

Really cool stuff,

Why does the variance when extrapolating only get big to the right of one the images? From a Gaussian processes perspective, I would expect it to happen whenever you get away from a data point.

POSTED BY: Eduardo Serna

Well, the variance does increase away from the data, just maybe not as quickly as you'd expect. From what I understood, this depends on the details of your network (the activation function, in particular) and the training parameters. Ultimately, those factors decide what kind of Gaussian process kernel you're effectively using, but the connection between the two isn't straightforward (and an active area of research, as far as I could figure out).

Also, Gaussian processes do not necessarily produce arbitrarily large variance away from the data from what I've seen. If you take a simple example from the documentation of Predict, you can quite easily get a constant error band away from the data:

data = {-1.2 -> 1.2, 1.4 -> 1.4, 3.1 -> 1.8, 4.5 -> 1.6}; p = 
 Predict[data, Method -> "GaussianProcess"];
Show[Plot[{p[x],
   p[x] + StandardDeviation[p[x, "Distribution"]], 
   p[x] - StandardDeviation[p[x, "Distribution"]]},
  {x, -5, 10},
  PlotStyle -> {Blue, Gray, Gray},
  Filling -> {2 -> {3}},
  Exclusions -> False,
  PerformanceGoal -> "Speed", 
  PlotLegends -> {"Prediction", "Confidence Interval"}], 
 ListPlot[List @@@ data, PlotStyle -> Red, PlotLegends -> {"Data"}]]

enter image description here

POSTED BY: Sjoerd Smit

I would have to look into it, but my implementation with an exponential covariance, doesn't have a standard deviation that levels off when extrapolating. It has been a while so it could be a bug on my side, or, more likely, something conceptually different.

POSTED BY: Eduardo Serna

I think it depends on how you do your GP regression in this case and what kernel you use. If you make a point estimate for the covariance length scale, I think you end up with constant prediction variance far from your data (since basically none of the data points are correlated with the prediction point). If you do a proper Bayesian inference where you integrate over the posterior distribution of your length scale, it will be different since you'll have contributions from very long length scales.

In the blog post by Yarin he also shows an example of GP regression with a square exponential where the error bands become constant.

POSTED BY: Sjoerd Smit

I will need to talk to you when I get back into this.

POSTED BY: Eduardo Serna
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard