Group Abstract

Message Boards

WOLFRAM COMMUNITY

17.4K Views

12 Replies

3 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Wolfram Language Statistics and Probability

How does FindDistribution calculate BIC and AIC?

jl cb

Posted 10 years ago

A new tool called FIndDistribution calculate (among others) BIC and AIC. But when I'm check the values using the log likelihood value with the rest of parameters I get other values. Thank you Attachments:

POSTED BY: jl cb

12 Replies

Sort By:

Jim Baldwin

Posted 4 years ago

Seth: You might want to post a specific question about AIC with `LinearModelFit` (and `NonlinearModelFit`) as the issue might not be user error or an error in Mathematica but rather a difference in definitions. To use a slightly modified example from the documentation (I added an additional sample point) here is what `LinearModelFit` puts out: data = {{0, 1}, {1, 0}, {3, 2}, {5, 4}, {7, 5}}; n = 5; (* Sample size ) p = 3; ( Number of parameters: intercept, slope, and residual error variance ) lm = LinearModelFit[data, x, x]; lm["AIC"] ( 15.1332 ) And here is what `LinearModelFit` is doing (in a slightly less-black box way): aic = -2 LogLikelihood[NormalDistribution[0, lm["EstimatedVariance"]^0.5], lm["FitResiduals"]] + 2p (* 15.1332 *) The issue is: Is the definition of AIC that `LinearModelFit` uses the definition that you want to use? In essence restricted maximum likelihood (REML) is being performed (i.e., obtaining the unbiased estimate of variance) but rather than plugging in that estimate of variance into the log of the restricted likelihood function, that estimate is being plugged into the unrestricted log likelihood function. That is not what SAS or R uses to construct AIC. At best how `LinearModelFit` and `NonlinearModelFit` calculates AIC is not common practice.

POSTED BY: Jim Baldwin

Seth Chandler

Seth Chandler, University of Houston

Posted 4 years ago

Thank you very much, Jim. I had tried similar methods but had never been able to reproduce the Wolfram computation. Wolfram Statistics folks: I am insufficiently expert in statistics to judge whether the Wolfram method or the R/SAS method is better. A suggestion. It might help to provide some official documentation as to how and why the computation was being done as it was. Perhaps a cite to an article or a worked example like Jim has provided. Alternatively something semi-official on this Community site would be useful. Thanks to all.

POSTED BY: Seth Chandler

Jim Baldwin

Posted 4 years ago

How does FindDistribution calculate BIC and AIC?

POSTED BY: Jim Baldwin

Jim Baldwin

Posted 4 years ago

Not resolved as far as I know. And there is another issue I should have noticed before: the log likelihood is wrong also. The "mean" of minus the log likelihood is used (i.e., the log of the likelihood for the data and proposed model is divided by the sample size). The ranking of models isn't affected but if one attempts to apply the common threshold of "2 AIC units" suggesting a very different model, well, that won't work with the AIC values produced by `FindDistribution`. I'll write up a summary and send to Wolfram, Inc., and report back. Below working example: (* Generate data and find the 5 best fitting distributions ) n = 100; ( Sample size ) SeedRandom[12345]; data = RandomVariate[ExponentialDistribution[1], n]; nbest = 5; TableForm[(fd = FindDistribution[data, nbest, {"LogLikelihood", "AIC"}]), TableHeadings -> {("Distribution " <> # &) /@ (ToString[#] & /@ Range[nbest]), {"Distribution", "log(L) and AIC"}}] ( Log of the likelihood as calculated by FindDistribution ) Column[{Style["Log of the likelihood\n", 18, Bold], TableForm[Table[{fd[[i, 2, 1]], LogLikelihood[fd[[i, 1]], data]/n, LogLikelihood[fd[[i, 1]], data]}, {i, nbest}], TableHeadings -> {("Distribution " <> # &) /@ (ToString[#] & /@ Range[nbest]), {"\nFrom\nFindDistribution", "Duplicating what\nFindDistribution\ndoes", "What the\nlog likelihood\nshould be"}}]}] ( AIC as calculated by FindDistribution *) Column[{Style["AIC\n", 18, Bold], TableForm[Table[{fd[[i, 2, 2]], 2 LogLikelihood[fd[[i, 1]], data]/n - 2 k[[i]]/(n - k[[i]] - 1), -2 LogLikelihood[fd[[i, 1]], data] + 2 k[[i]]}, {i, nbest}], TableHeadings -> {("Distribution " <> # &) /@ (ToString[#] & /@ Range[nbest]), {"\nFrom\nFindDistribution", "Duplicating what\nFindDistribution\ndoes", "\nWhat AIC\nshould be"}}]}] (Note the sign of AIC is arbitrary. A common approach is to choose the "smaller is better" approach which is what I used for the "What AIC should be" column.)

Not resolved as far as I know. And there is another issue I should have noticed before: the log likelihood is wrong also. The "mean" of minus the log likelihood is used (i.e., the log of the likelihood for the data and proposed model is divided by the sample size). The ranking of models isn't affected but if one attempts to apply the common threshold of "2 AIC units" suggesting a very different model, well, that won't work with the AIC values produced by FindDistribution.

I'll write up a summary and send to Wolfram, Inc., and report back. Below working example:

(* Generate data and find the 5 best fitting distributions *)
n = 100; (* Sample size *) 
SeedRandom[12345];
data = RandomVariate[ExponentialDistribution[1], n];
nbest = 5;
TableForm[(fd = FindDistribution[data, nbest, {"LogLikelihood", "AIC"}]),
 TableHeadings -> {("Distribution " <> # &) /@ (ToString[#] & /@ Range[nbest]), {"Distribution", "log(L) and AIC"}}]

Results of FindDistribution

(* Log of the likelihood as calculated by FindDistribution *)
Column[{Style["Log of the likelihood\n", 18, Bold],
  TableForm[Table[{fd[[i, 2, 1]], LogLikelihood[fd[[i, 1]], data]/n,
     LogLikelihood[fd[[i, 1]], data]}, {i, nbest}],
   TableHeadings -> {("Distribution " <> # &) /@ (ToString[#] & /@ Range[nbest]),
     {"\nFrom\nFindDistribution", "Duplicating what\nFindDistribution\ndoes",
      "What the\nlog likelihood\nshould be"}}]}]

Log of likelihood

(* AIC as calculated by FindDistribution *)
Column[{Style["AIC\n", 18, Bold], TableForm[Table[{fd[[i, 2, 2]],
     2 LogLikelihood[fd[[i, 1]], data]/n - 2 k[[i]]/(n - k[[i]] - 1),
     -2 LogLikelihood[fd[[i, 1]], data] + 2 k[[i]]}, {i, nbest}],
   TableHeadings -> {("Distribution " <> # &) /@ (ToString[#] & /@ Range[nbest]),
     {"\nFrom\nFindDistribution", "Duplicating what\nFindDistribution\ndoes",
      "\nWhat AIC\nshould be"}}]}]

AIC

(Note the sign of AIC is arbitrary. A common approach is to choose the "smaller is better" approach which is what I used for the "What AIC should be" column.)

POSTED BY: Jim Baldwin

Jim Baldwin

Posted 4 years ago

How does FindDistribution calculate BIC and AIC?

POSTED BY: Jim Baldwin

Seth Chandler

Seth Chandler, University of Houston

Posted 4 years ago

Did this ever get fixed or otherwise resolved? Having some issues about the AIC using LinearModelFit, but it could well be user error rather than a bug.

POSTED BY: Seth Chandler

Jim Baldwin

Jim Baldwin, Retired

Posted 10 years ago

The "LogLikelihood" values reported by `FindDistribution` appear to be different than one one would likely use, too. (The reported LogLikelihood is the real LogLikelihood using the parameter estimate divided by the sample size. So it is an average log likelihood contribution for an individual observation?) Also, the estimates of the parameters does not appear to be a maximum likelihood estimates. Below is some code to show these issues: n = 500; SeedRandom[12345]; data = RandomVariate[ExponentialDistribution[1], n]; fd = FindDistribution[data, 1, {"AIC", "LogLikelihood"}] (* {{ExponentialDistribution[1.040152396547962`],{-1.9080105994525727`,-0.9519972675977724`}}} ) ( Check on reported LogLikelihood ) (LogLikelihood[ExponentialDistribution[\[Lambda]], data] /. {\[Lambda] -> 1.040152396547962`})/n ( -0.9519972675977716` ) ( Maximum likelihood estimate: two different ways ) FindDistributionParameters[data, ExponentialDistribution[\[Lambda]]] ( {\[Lambda]\[Rule]1.049212868865103`} ) EstimatedDistribution[data, ExponentialDistribution[\[Lambda]]] ( ExponentialDistribution[1.049212868865103`] *) I see that in the documentation for the experimental `FindDistribution` there is a RandomSeed option and "The internal information criterion uses a Bayesian information criterion together with priors over TargetFunctions." So this tells me maximum likelihood is not being used but there aren't too many details given in the documentation. Maybe with more details in the documentation everything will become clear.

POSTED BY: Jim Baldwin

Sandu Ursu

Posted 10 years ago

Similarly, $AIC$ from `LinearModelFit` appears to be wrongly computed.

POSTED BY: Sandu Ursu

jl cb

Posted 10 years ago

Thanks. I'll check with SmoothKernelDistribution then. Not even LogLikelihood values can be used?

POSTED BY: jl cb

Jim Baldwin

Jim Baldwin, Retired

Posted 10 years ago

I was too mild in my assessment: the AIC definition used by `FindDistribution` is wrong. So I'm a assuming that it is a coding error rather than an alternative formulation of AIC. (Likely the intent was to use $AIC_c$ but the sample size ( $n$) in the numerator for the number of parameters adjustment was left out of the equation. But someone will correct me if I'm wrong.) That wrong definition way under-corrects for the number of parameters in a model so the ranking of models from FindDistribution is suspect (if the number of parameters vary among the models considered). To make the usual statements about $AIC$: this measure allows the ranking of models and cannot be used as a measure of absolute fit. The model with the best $AIC$ might be the best model of a bunch of bad models or the best of a bunch of very good models. While `FindDistribution` can get you a parsimonious description of the distribution of your data (a few parameters + a particular family of distributions) that you can easily use outside of Mathematica, if you want a more justifiable description of the data, I'd suggest using `SmoothKernelDistribution`. You don't get a parsimonious description of your data but it will almost certainly provide a better fit.

POSTED BY: Jim Baldwin

jl cb

Posted 10 years ago

Thanks for the clarification. I don't remember the alternative version of AIC. For now i will use the standard version of AIC. Do you know under what criteria FindDisitribution sort from the first best fit to the last fit? Thanks again.

POSTED BY: jl cb

Jim Baldwin

Jim Baldwin, Retired

Posted 10 years ago

It does not appear that a standard definition of AIC is used in `FindDistribution`. The usual formula with $n$ observations and $k$ parameters is $AIC=-2 \log L+2 k$. But `FindDistribution` seems to use $AIC=2 \log L - 2k/(n-k-1)$. Here's "proof" of what `FindDistribution` uses. (This works unless a `MixtureDistribution` is found with more than two distributions.) Set the sample size and get a random sample: n = 100; data = RandomVariate[Exponential[1], n]; Find the top best fitting distributions and collect the `AIC` and `LogLikelihood` values: nbest = 5; fd = FindDistribution[data, nbest, {"AIC", "LogLikelihood"}] (* {{ExponentialDistribution[1.07497], {-1.83436, -0.906976}}, {MixtureDistribution[{0.72248, 0.27752}, {GammaDistribution[1.2916, 0.369832], UniformDistribution[{0.0177466, 3.77159}]}], {-1.75598, -0.824797}}, {MixtureDistribution[{0.765586, 0.234414}, {GammaDistribution[1.3553, 0.342329], NormalDistribution[2.45319, 0.808439]}],{-1.78878, -0.841199}}, {LogNormalDistribution[-0.708397,1.24473], {-1.84787, -0.903319}}, {WeibullDistribution[0.92819,0.897149], {-1.85254, -0.905651}}} ) Find $k$ (the number of parameters): k = StringCount[Table[ToString[fd[[i, 1]]], {i, nbest}], ","] + 1; ( {1,6,6,2,2} ) nMixtures = StringCount[Table[ToString[fd[[i, 1]]], {i, nbest}], "MixtureDistribution"] ( {0,1,1,0,0} ) k = k - nMixtures ( {1,5,5,2,2} ) Extract the `LogLikelihood` and `AIC` values and show the `AIC` values from the formula used by `FindDistribution`: logL = Table[fd[[i, 2, 2]], {i, nbest}] ( {-0.9069763208709088`,-0.8247971259593575`,-0.8411993739988386`,-0.903318900970027`,-0.9056506499137531`} ) aic = Table[fd[[i, 2, 1]], {i, nbest}] ( {-1.8343608050071236`,-1.7559772306421193`,-1.7887817267210813`,-1.847874915342116`,-1.8525384132295681`} ) 2 logL - 2 k/(n - k - 1) ( {-1.8343608050071238`,-1.7559772306421193`,-1.7887817267210815`,-1.847874915342116`,-1.8525384132295681`} *) It appears that the formula used is wrong and that it seems to be a combination of $AIC$ and $AIC_c$ as $AIC_c =-2\log L + 2 k n/(n-k-1)$.

It does not appear that a standard definition of AIC is used in FindDistribution. The usual formula with $n$ observations and $k$ parameters is $AIC=-2 \log L+2 k$. But FindDistribution seems to use $AIC=2 \log L - 2k/(n-k-1)$.

Here's "proof" of what FindDistribution uses. (This works unless a MixtureDistribution is found with more than two distributions.)

Set the sample size and get a random sample:

n = 100;
data = RandomVariate[Exponential[1], n];

Find the top best fitting distributions and collect the AIC and LogLikelihood values:

nbest = 5;
fd = FindDistribution[data, nbest, {"AIC", "LogLikelihood"}]
(* {{ExponentialDistribution[1.07497], {-1.83436, -0.906976}},
{MixtureDistribution[{0.72248, 0.27752}, {GammaDistribution[1.2916, 0.369832], UniformDistribution[{0.0177466, 3.77159}]}], {-1.75598, -0.824797}},
{MixtureDistribution[{0.765586, 0.234414}, {GammaDistribution[1.3553, 0.342329], NormalDistribution[2.45319, 0.808439]}],{-1.78878, -0.841199}},
{LogNormalDistribution[-0.708397,1.24473], {-1.84787, -0.903319}},
{WeibullDistribution[0.92819,0.897149], {-1.85254, -0.905651}}} *)

Find $k$ (the number of parameters):

k = StringCount[Table[ToString[fd[[i, 1]]], {i, nbest}], ","] + 1;
(* {1,6,6,2,2} *)
nMixtures = StringCount[Table[ToString[fd[[i, 1]]], {i, nbest}], "MixtureDistribution"]
(* {0,1,1,0,0} *)
k = k - nMixtures
(* {1,5,5,2,2} *)

Extract the LogLikelihood and AIC values and show the AIC values from the formula used by FindDistribution:

logL = Table[fd[[i, 2, 2]], {i, nbest}]
(* {-0.9069763208709088`,-0.8247971259593575`,-0.8411993739988386`,-0.903318900970027`,-0.9056506499137531`} *)
aic = Table[fd[[i, 2, 1]], {i, nbest}]
(* {-1.8343608050071236`,-1.7559772306421193`,-1.7887817267210813`,-1.847874915342116`,-1.8525384132295681`} *)
2 logL - 2 k/(n - k - 1)
(* {-1.8343608050071238`,-1.7559772306421193`,-1.7887817267210815`,-1.847874915342116`,-1.8525384132295681`} *)

It appears that the formula used is wrong and that it seems to be a combination of $AIC$ and $AIC_c$ as $AIC_c =-2\log L + 2 k n/(n-k-1)$.

POSTED BY: Jim Baldwin

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback