It does not appear that a standard definition of AIC is used in FindDistribution
. The usual formula with
$n$ observations and
$k$ parameters is
$AIC=-2 \log L+2 k$. But FindDistribution
seems to use
$AIC=2 \log L - 2k/(n-k-1)$.
Here's "proof" of what FindDistribution
uses. (This works unless a MixtureDistribution
is found with more than two distributions.)
Set the sample size and get a random sample:
n = 100;
data = RandomVariate[Exponential[1], n];
Find the top best fitting distributions and collect the AIC
and LogLikelihood
values:
nbest = 5;
fd = FindDistribution[data, nbest, {"AIC", "LogLikelihood"}]
(* {{ExponentialDistribution[1.07497], {-1.83436, -0.906976}},
{MixtureDistribution[{0.72248, 0.27752}, {GammaDistribution[1.2916, 0.369832], UniformDistribution[{0.0177466, 3.77159}]}], {-1.75598, -0.824797}},
{MixtureDistribution[{0.765586, 0.234414}, {GammaDistribution[1.3553, 0.342329], NormalDistribution[2.45319, 0.808439]}],{-1.78878, -0.841199}},
{LogNormalDistribution[-0.708397,1.24473], {-1.84787, -0.903319}},
{WeibullDistribution[0.92819,0.897149], {-1.85254, -0.905651}}} *)
Find
$k$ (the number of parameters):
k = StringCount[Table[ToString[fd[[i, 1]]], {i, nbest}], ","] + 1;
(* {1,6,6,2,2} *)
nMixtures = StringCount[Table[ToString[fd[[i, 1]]], {i, nbest}], "MixtureDistribution"]
(* {0,1,1,0,0} *)
k = k - nMixtures
(* {1,5,5,2,2} *)
Extract the LogLikelihood
and AIC
values and show the AIC
values from the formula used by FindDistribution
:
logL = Table[fd[[i, 2, 2]], {i, nbest}]
(* {-0.9069763208709088`,-0.8247971259593575`,-0.8411993739988386`,-0.903318900970027`,-0.9056506499137531`} *)
aic = Table[fd[[i, 2, 1]], {i, nbest}]
(* {-1.8343608050071236`,-1.7559772306421193`,-1.7887817267210813`,-1.847874915342116`,-1.8525384132295681`} *)
2 logL - 2 k/(n - k - 1)
(* {-1.8343608050071238`,-1.7559772306421193`,-1.7887817267210815`,-1.847874915342116`,-1.8525384132295681`} *)
It appears that the formula used is wrong and that it seems to be a combination of
$AIC$ and
$AIC_c$ as
$AIC_c =-2\log L + 2 k n/(n-k-1)$.