Group Abstract

Message Boards

3K Views

1 Reply

0 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Statistics and Probability

Posted 12 years ago

I came across a curious phenomena in my work recently. I am trying to fit a parametric distribution to some truncated data that also includes a weight for each observation using Mathematica. Maximum likelihood obviously does this pretty neatly. However, I used two different ways of doing it in Mathematica: d = WeightedData[rawdat[[All, 2]], rawdat[[All, 1]]] EstimatedDistribution[d, TruncatedDistribution[{0, 150000}, LogNormalDistribution[\[Mu], \[Sigma]]], {\[Mu], \[Sigma]}]] LogLikelihood[%[[2]], d] Let's say this gives me: mu = 9.86564 and sigma = 0.905846, and the log-likelihood has a value of -11.1108 at the solution. However, this took 14 sec. Or I do this: NMaximize[{ (1/Total[rawdat[[All, 1]]]) * Total[rawdat[[All, 1]].Log[PDF[TruncatedDistribution[{0, 150000}, LogNormalDistribution[\[Mu], \[Sigma]]], #] & /@ rawdat[[All, 2]]]], \[Sigma] > 0}, {\[Mu],\[Sigma]}]] Now the answer is: mu = 9.86567 and sigma = 0.905853, though the log-likelihood has the same value at the solution as before. Most importantly, this took only 2 sec. What gives? Why are the solutions slightly different and why is EstimatedDistribution[] so much slower? Is it that the WeightedData[] function is harder for Mathematica to deal with? Does anyone here know? Markus PS: This was a sub-sample (n = 250) of the full data that I am using. With the full data, the difference in calculation time is 5+ minutes vs. 50 seconds.

POSTED BY: Markus S.

1 Reply

Sort By:

Posted 12 years ago

EstimatedDistribution is a super function, choosing between a list of methods depending on its analysis of the input you give it. So if it chose to use the same method you manually wrote out, I would expect it to run slower simply because it had to do the analysis to decide to use the method that you wrote out and then had to calculate using the method that you wrote out. Alternatively, it could have chosen an entirely different way to estimate the parameters which might have been slower for some reason. If you are still interested in this, I would suggest running EstimatedDistribution with the option Method option set to its different values. This would provide you with more information. It might also be that EstimatedDistribution isn't handling the WeightedData as cleverly and is just simply expanding the list weighted data into a list of values.

POSTED BY: Sean Clarke

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback