Message Boards Message Boards

0
|
13679 Views
|
12 Replies
|
3 Total Likes
View groups...
Share
Share this post:

How does FindDistribution calculate BIC and AIC?

Posted 8 years ago

A new tool called FIndDistribution calculate (among others) BIC and AIC. But when I'm check the values using the log likelihood value with the rest of parameters I get other values. Thank you

Attachments:
POSTED BY: jl cb
12 Replies
Posted 3 years ago

Seth: You might want to post a specific question about AIC with LinearModelFit (and NonlinearModelFit) as the issue might not be user error or an error in Mathematica but rather a difference in definitions.

To use a slightly modified example from the documentation (I added an additional sample point) here is what LinearModelFit puts out:

data = {{0, 1}, {1, 0}, {3, 2}, {5, 4}, {7, 5}};
n = 5;  (* Sample size *)
p = 3;  (* Number of parameters: intercept, slope, and residual error variance *)
lm = LinearModelFit[data, x, x];
lm["AIC"]
(* 15.1332 *)

And here is what LinearModelFit is doing (in a slightly less-black box way):

aic = -2 LogLikelihood[NormalDistribution[0, lm["EstimatedVariance"]^0.5], lm["FitResiduals"]] + 2*p
(* 15.1332 *)

The issue is: Is the definition of AIC that LinearModelFit uses the definition that you want to use?

In essence restricted maximum likelihood (REML) is being performed (i.e., obtaining the unbiased estimate of variance) but rather than plugging in that estimate of variance into the log of the restricted likelihood function, that estimate is being plugged into the unrestricted log likelihood function. That is not what SAS or R uses to construct AIC. At best how LinearModelFit and NonlinearModelFit calculates AIC is not common practice.

POSTED BY: Jim Baldwin

Thank you very much, Jim. I had tried similar methods but had never been able to reproduce the Wolfram computation.

Wolfram Statistics folks: I am insufficiently expert in statistics to judge whether the Wolfram method or the R/SAS method is better. A suggestion. It might help to provide some official documentation as to how and why the computation was being done as it was. Perhaps a cite to an article or a worked example like Jim has provided. Alternatively something semi-official on this Community site would be useful.

Thanks to all.

POSTED BY: Seth Chandler
Posted 8 years ago

It does not appear that a standard definition of AIC is used in FindDistribution. The usual formula with $n$ observations and $k$ parameters is $AIC=-2 \log L+2 k$. But FindDistribution seems to use $AIC=2 \log L - 2k/(n-k-1)$.

Here's "proof" of what FindDistribution uses. (This works unless a MixtureDistribution is found with more than two distributions.)

Set the sample size and get a random sample:

n = 100;
data = RandomVariate[Exponential[1], n];

Find the top best fitting distributions and collect the AIC and LogLikelihood values:

nbest = 5;
fd = FindDistribution[data, nbest, {"AIC", "LogLikelihood"}]
(* {{ExponentialDistribution[1.07497], {-1.83436, -0.906976}},
{MixtureDistribution[{0.72248, 0.27752}, {GammaDistribution[1.2916, 0.369832], UniformDistribution[{0.0177466, 3.77159}]}], {-1.75598, -0.824797}},
{MixtureDistribution[{0.765586, 0.234414}, {GammaDistribution[1.3553, 0.342329], NormalDistribution[2.45319, 0.808439]}],{-1.78878, -0.841199}},
{LogNormalDistribution[-0.708397,1.24473], {-1.84787, -0.903319}},
{WeibullDistribution[0.92819,0.897149], {-1.85254, -0.905651}}} *)

Find $k$ (the number of parameters):

k = StringCount[Table[ToString[fd[[i, 1]]], {i, nbest}], ","] + 1;
(* {1,6,6,2,2} *)
nMixtures = StringCount[Table[ToString[fd[[i, 1]]], {i, nbest}], "MixtureDistribution"]
(* {0,1,1,0,0} *)
k = k - nMixtures
(* {1,5,5,2,2} *)

Extract the LogLikelihood and AIC values and show the AIC values from the formula used by FindDistribution:

logL = Table[fd[[i, 2, 2]], {i, nbest}]
(* {-0.9069763208709088`,-0.8247971259593575`,-0.8411993739988386`,-0.903318900970027`,-0.9056506499137531`} *)
aic = Table[fd[[i, 2, 1]], {i, nbest}]
(* {-1.8343608050071236`,-1.7559772306421193`,-1.7887817267210813`,-1.847874915342116`,-1.8525384132295681`} *)
2 logL - 2 k/(n - k - 1)
(* {-1.8343608050071238`,-1.7559772306421193`,-1.7887817267210815`,-1.847874915342116`,-1.8525384132295681`} *)

It appears that the formula used is wrong and that it seems to be a combination of $AIC$ and $AIC_c$ as $AIC_c =-2\log L + 2 k n/(n-k-1)$.

POSTED BY: Jim Baldwin
Posted 8 years ago

Thanks for the clarification. I don't remember the alternative version of AIC. For now i will use the standard version of AIC. Do you know under what criteria FindDisitribution sort from the first best fit to the last fit? Thanks again.

POSTED BY: jl cb
Posted 8 years ago

I was too mild in my assessment: the AIC definition used by FindDistribution is wrong. So I'm a assuming that it is a coding error rather than an alternative formulation of AIC. (Likely the intent was to use $AIC_c$ but the sample size ( $n$) in the numerator for the number of parameters adjustment was left out of the equation. But someone will correct me if I'm wrong.) That wrong definition way under-corrects for the number of parameters in a model so the ranking of models from FindDistribution is suspect (if the number of parameters vary among the models considered).

To make the usual statements about $AIC$: this measure allows the ranking of models and cannot be used as a measure of absolute fit. The model with the best $AIC$ might be the best model of a bunch of bad models or the best of a bunch of very good models.

While FindDistribution can get you a parsimonious description of the distribution of your data (a few parameters + a particular family of distributions) that you can easily use outside of Mathematica, if you want a more justifiable description of the data, I'd suggest using SmoothKernelDistribution. You don't get a parsimonious description of your data but it will almost certainly provide a better fit.

POSTED BY: Jim Baldwin
Posted 8 years ago

Thanks. I'll check with SmoothKernelDistribution then. Not even LogLikelihood values can be used?

POSTED BY: jl cb
Posted 8 years ago

Similarly, $AIC$ from LinearModelFit appears to be wrongly computed.

POSTED BY: Sandu Ursu
Posted 8 years ago

The "LogLikelihood" values reported by FindDistribution appear to be different than one one would likely use, too. (The reported LogLikelihood is the real LogLikelihood using the parameter estimate divided by the sample size. So it is an average log likelihood contribution for an individual observation?) Also, the estimates of the parameters does not appear to be a maximum likelihood estimates.

Below is some code to show these issues:

n = 500;
SeedRandom[12345];
data = RandomVariate[ExponentialDistribution[1], n];

fd = FindDistribution[data, 1, {"AIC", "LogLikelihood"}]
(* {{ExponentialDistribution[1.040152396547962`],{-1.9080105994525727`,-0.9519972675977724`}}} *)

(* Check on reported LogLikelihood *)
(LogLikelihood[ExponentialDistribution[\[Lambda]], 
    data] /. {\[Lambda] -> 1.040152396547962`})/n
(* -0.9519972675977716` *)

(* Maximum likelihood estimate: two different ways *)
FindDistributionParameters[data, ExponentialDistribution[\[Lambda]]]
(* {\[Lambda]\[Rule]1.049212868865103`} *)
EstimatedDistribution[data, ExponentialDistribution[\[Lambda]]]
(* ExponentialDistribution[1.049212868865103`] *)

I see that in the documentation for the experimental FindDistribution there is a RandomSeed option and "The internal information criterion uses a Bayesian information criterion together with priors over TargetFunctions." So this tells me maximum likelihood is not being used but there aren't too many details given in the documentation. Maybe with more details in the documentation everything will become clear.

POSTED BY: Jim Baldwin

Did this ever get fixed or otherwise resolved? Having some issues about the AIC using LinearModelFit, but it could well be user error rather than a bug.

POSTED BY: Seth Chandler
Posted 3 years ago

How does FindDistribution calculate BIC and AIC?

POSTED BY: Jim Baldwin
Posted 3 years ago

Not resolved as far as I know. And there is another issue I should have noticed before: the log likelihood is wrong also. The "mean" of minus the log likelihood is used (i.e., the log of the likelihood for the data and proposed model is divided by the sample size). The ranking of models isn't affected but if one attempts to apply the common threshold of "2 AIC units" suggesting a very different model, well, that won't work with the AIC values produced by FindDistribution.

I'll write up a summary and send to Wolfram, Inc., and report back. Below working example:

(* Generate data and find the 5 best fitting distributions *)
n = 100; (* Sample size *) 
SeedRandom[12345];
data = RandomVariate[ExponentialDistribution[1], n];
nbest = 5;
TableForm[(fd = FindDistribution[data, nbest, {"LogLikelihood", "AIC"}]),
 TableHeadings -> {("Distribution " <> # &) /@ (ToString[#] & /@ Range[nbest]), {"Distribution", "log(L) and AIC"}}]

Results of FindDistribution

(* Log of the likelihood as calculated by FindDistribution *)
Column[{Style["Log of the likelihood\n", 18, Bold],
  TableForm[Table[{fd[[i, 2, 1]], LogLikelihood[fd[[i, 1]], data]/n,
     LogLikelihood[fd[[i, 1]], data]}, {i, nbest}],
   TableHeadings -> {("Distribution " <> # &) /@ (ToString[#] & /@ Range[nbest]),
     {"\nFrom\nFindDistribution", "Duplicating what\nFindDistribution\ndoes",
      "What the\nlog likelihood\nshould be"}}]}]

Log of likelihood

(* AIC as calculated by FindDistribution *)
Column[{Style["AIC\n", 18, Bold], TableForm[Table[{fd[[i, 2, 2]],
     2 LogLikelihood[fd[[i, 1]], data]/n - 2 k[[i]]/(n - k[[i]] - 1),
     -2 LogLikelihood[fd[[i, 1]], data] + 2 k[[i]]}, {i, nbest}],
   TableHeadings -> {("Distribution " <> # &) /@ (ToString[#] & /@ Range[nbest]),
     {"\nFrom\nFindDistribution", "Duplicating what\nFindDistribution\ndoes",
      "\nWhat AIC\nshould be"}}]}]

AIC

(Note the sign of AIC is arbitrary. A common approach is to choose the "smaller is better" approach which is what I used for the "What AIC should be" column.)

POSTED BY: Jim Baldwin
Posted 3 years ago

How does FindDistribution calculate BIC and AIC?

POSTED BY: Jim Baldwin
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract