I came across a curious phenomena in my work recently. I am trying to fit a parametric distribution to some truncated data that also includes a weight for each observation using Mathematica. Maximum likelihood obviously does this pretty neatly. However, I used two different ways of doing it in Mathematica:
d = WeightedData[rawdat[[All, 2]], rawdat[[All, 1]]]
EstimatedDistribution[d, TruncatedDistribution[{0, 150000}, LogNormalDistribution[\[Mu], \[Sigma]]], {\[Mu], \[Sigma]}]]
LogLikelihood[%[[2]], d]
Let's say this gives me: mu = 9.86564 and sigma = 0.905846, and the log-likelihood has a value of -11.1108 at the solution. However, this took 14 sec.
Or I do this:
NMaximize[{
(1/Total[rawdat[[All, 1]]]) *
Total[rawdat[[All, 1]].Log[PDF[TruncatedDistribution[{0, 150000}, LogNormalDistribution[\[Mu], \[Sigma]]], #] & /@ rawdat[[All, 2]]]],
\[Sigma] > 0}, {\[Mu],\[Sigma]}]]
Now the answer is: mu = 9.86567 and sigma = 0.905853, though the log-likelihood has the same value at the solution as before. Most importantly, this took only 2 sec.
What gives? Why are the solutions
slightly different and why is EstimatedDistribution[] so much slower? Is it that the WeightedData[] function is harder for Mathematica to deal with? Does anyone here know?
Markus
PS: This was a sub-sample (n = 250) of the full data that I am using. With the full data, the difference in calculation time is 5+ minutes vs. 50 seconds.