Group Abstract Group Abstract

Message Boards Message Boards

0
|
17.9K Views
|
12 Replies
|
3 Total Likes
View groups...
Share
Share this post:

How does FindDistribution calculate BIC and AIC?

Posted 10 years ago

A new tool called FIndDistribution calculate (among others) BIC and AIC. But when I'm check the values using the log likelihood value with the rest of parameters I get other values. Thank you

Attachments:
POSTED BY: jl cb
12 Replies

Thank you very much, Jim. I had tried similar methods but had never been able to reproduce the Wolfram computation.

Wolfram Statistics folks: I am insufficiently expert in statistics to judge whether the Wolfram method or the R/SAS method is better. A suggestion. It might help to provide some official documentation as to how and why the computation was being done as it was. Perhaps a cite to an article or a worked example like Jim has provided. Alternatively something semi-official on this Community site would be useful.

Thanks to all.

POSTED BY: Seth Chandler
Posted 4 years ago

Seth: You might want to post a specific question about AIC with LinearModelFit (and NonlinearModelFit) as the issue might not be user error or an error in Mathematica but rather a difference in definitions.

To use a slightly modified example from the documentation (I added an additional sample point) here is what LinearModelFit puts out:

data = {{0, 1}, {1, 0}, {3, 2}, {5, 4}, {7, 5}};
n = 5;  (* Sample size *)
p = 3;  (* Number of parameters: intercept, slope, and residual error variance *)
lm = LinearModelFit[data, x, x];
lm["AIC"]
(* 15.1332 *)

And here is what LinearModelFit is doing (in a slightly less-black box way):

aic = -2 LogLikelihood[NormalDistribution[0, lm["EstimatedVariance"]^0.5], lm["FitResiduals"]] + 2*p
(* 15.1332 *)

The issue is: Is the definition of AIC that LinearModelFit uses the definition that you want to use?

In essence restricted maximum likelihood (REML) is being performed (i.e., obtaining the unbiased estimate of variance) but rather than plugging in that estimate of variance into the log of the restricted likelihood function, that estimate is being plugged into the unrestricted log likelihood function. That is not what SAS or R uses to construct AIC. At best how LinearModelFit and NonlinearModelFit calculates AIC is not common practice.

POSTED BY: Jim Baldwin
Posted 4 years ago

Not resolved as far as I know. And there is another issue I should have noticed before: the log likelihood is wrong also. The "mean" of minus the log likelihood is used (i.e., the log of the likelihood for the data and proposed model is divided by the sample size). The ranking of models isn't affected but if one attempts to apply the common threshold of "2 AIC units" suggesting a very different model, well, that won't work with the AIC values produced by FindDistribution.

I'll write up a summary and send to Wolfram, Inc., and report back. Below working example:

(* Generate data and find the 5 best fitting distributions *)
n = 100; (* Sample size *) 
SeedRandom[12345];
data = RandomVariate[ExponentialDistribution[1], n];
nbest = 5;
TableForm[(fd = FindDistribution[data, nbest, {"LogLikelihood", "AIC"}]),
 TableHeadings -> {("Distribution " <> # &) /@ (ToString[#] & /@ Range[nbest]), {"Distribution", "log(L) and AIC"}}]

Results of FindDistribution

(* Log of the likelihood as calculated by FindDistribution *)
Column[{Style["Log of the likelihood\n", 18, Bold],
  TableForm[Table[{fd[[i, 2, 1]], LogLikelihood[fd[[i, 1]], data]/n,
     LogLikelihood[fd[[i, 1]], data]}, {i, nbest}],
   TableHeadings -> {("Distribution " <> # &) /@ (ToString[#] & /@ Range[nbest]),
     {"\nFrom\nFindDistribution", "Duplicating what\nFindDistribution\ndoes",
      "What the\nlog likelihood\nshould be"}}]}]

Log of likelihood

(* AIC as calculated by FindDistribution *)
Column[{Style["AIC\n", 18, Bold], TableForm[Table[{fd[[i, 2, 2]],
     2 LogLikelihood[fd[[i, 1]], data]/n - 2 k[[i]]/(n - k[[i]] - 1),
     -2 LogLikelihood[fd[[i, 1]], data] + 2 k[[i]]}, {i, nbest}],
   TableHeadings -> {("Distribution " <> # &) /@ (ToString[#] & /@ Range[nbest]),
     {"\nFrom\nFindDistribution", "Duplicating what\nFindDistribution\ndoes",
      "\nWhat AIC\nshould be"}}]}]

AIC

(Note the sign of AIC is arbitrary. A common approach is to choose the "smaller is better" approach which is what I used for the "What AIC should be" column.)

POSTED BY: Jim Baldwin

Did this ever get fixed or otherwise resolved? Having some issues about the AIC using LinearModelFit, but it could well be user error rather than a bug.

POSTED BY: Seth Chandler
Posted 4 years ago

How does FindDistribution calculate BIC and AIC?

POSTED BY: Jim Baldwin
Posted 4 years ago
POSTED BY: Jim Baldwin
Posted 10 years ago

Similarly, $AIC$ from LinearModelFit appears to be wrongly computed.

POSTED BY: Sandu Ursu
Posted 10 years ago

Thanks. I'll check with SmoothKernelDistribution then. Not even LogLikelihood values can be used?

POSTED BY: jl cb
Posted 10 years ago
POSTED BY: Jim Baldwin
Posted 10 years ago
POSTED BY: Jim Baldwin
Posted 10 years ago
POSTED BY: jl cb
Posted 10 years ago
POSTED BY: Jim Baldwin
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard