Message Boards Message Boards

1 Reply
0 Total Likes
View groups...
Share this post:

MLE using Truncated Data: NMaximize vs. EstimatedDistribution?

Posted 11 years ago
I came across a curious phenomena in my work recently.  I am trying to fit a parametric distribution to some truncated data that also includes a weight for each observation using Mathematica.  Maximum likelihood obviously does this pretty neatly.  However, I used two different ways of doing it in Mathematica:

d = WeightedData[rawdat[[All, 2]], rawdat[[All, 1]]]
EstimatedDistribution[d, TruncatedDistribution[{0, 150000}, LogNormalDistribution[\[Mu], \[Sigma]]], {\[Mu], \[Sigma]}]]
LogLikelihood[%[[2]], d]
Let's say this gives me: mu = 9.86564 and sigma = 0.905846, and the log-likelihood has a value of -11.1108 at the solution.  However, this took 14 sec.
Or I do this:
  (1/Total[rawdat[[All, 1]]]) *
    Total[rawdat[[All, 1]].Log[PDF[TruncatedDistribution[{0, 150000}, LogNormalDistribution[\[Mu], \[Sigma]]], #] & /@ rawdat[[All, 2]]]],
  \[Sigma] > 0}, {\[Mu],\[Sigma]}]]
Now the answer is: mu = 9.86567 and sigma = 0.905853, though the log-likelihood has the same value at the solution as before.  Most importantly, this took only 2 sec.

What gives?  Why are the solutions slightly different and why is EstimatedDistribution[] so much slower?  Is it that the WeightedData[] function is harder for Mathematica to deal with?  Does anyone here know?


PS: This was a sub-sample (n = 250) of the full data that I am using.  With the full data, the difference in calculation time is 5+ minutes vs. 50 seconds.
POSTED BY: Markus S.
EstimatedDistribution is a super function, choosing between a list of methods depending on its analysis of the input you give it.  So if it chose to use the same method you manually wrote out, I would expect it to run slower simply because it had to do the analysis to decide to use the method that you wrote out and then had to calculate using the method that you wrote out. Alternatively, it could have chosen an entirely different way to estimate the parameters which might have been slower for some reason.

If you are still interested in this, I would suggest running EstimatedDistribution with the option Method option set to its different values. This would provide you with more information. 

It might also be that EstimatedDistribution isn't handling the WeightedData as cleverly and is just simply expanding the list weighted data into a list of values.
POSTED BY: Sean Clarke
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract