Group Abstract Group Abstract

Message Boards Message Boards

0
|
13.3K Views
|
9 Replies
|
7 Total Likes
View groups...
Share
Share this post:

Avoid message error "MemoryAllocationFailure" with NonlinearModelFit?

This code has performed an exponential adjustment and gets the "ParameterErrors" and "RSquared" for a exponetial model and "NonlinearModelFit". The code is running correctly for almost data ... However, for this data, attached in the code, Mathematica is freezing during the evaluation of "ParameterErrors" and "RSquared" and "ParameterTable". Please. How can I evaluate this data without freezing and errors? SystemException ["MemoryAllocationFailure"]

ClearAll["Global`*"]
ClearSystemCache[]

sizedistcum = {{1.12996, 0.}, {1.42366, 0.}, {1.7937, 0.}, {2.25992, 
    0.}, {2.84732, 0.}, {3.5874, 0.}, {4.51984, 0.}, {5.69464, 
    0.}, {7.1748, 0.}, {9.03968, 0.}, {11.3893, 0.}, {14.3496, 
    0.}, {18.0794, 0.}, {22.7786, 0.}, {28.6992, 0.}, {36.1587, 
    0.}, {45.5572, 0.}, {57.3984, 0.00215924}, {72.3175, 
    0.0283401}, {91.1143, 0.0537112}, {114.797, 0.0753036}, {144.635, 
    0.103104}, {182.229, 0.268826}, {229.594, 0.832659}, {289.27, 
    0.964372}, {364.457, 0.993522}, {459.187, 1.}, {578.54, 
    1.}, {728.914, 1.}, {918.375, 1.}, {1157.08, 1.}, {1457.83, 
    1.}, {1836.75, 1.}, {2314.16, 1.}, {2915.66, 1.}, {3673.5, 
    1.}, {4628.32, 1.}, {5831.32, 1.}, {7347., 1.}, {9256.64, 
    1.}, {11662.6, 1.}};

RRmodel = 1 - Exp[-(x/dm)^n];

nlm = NonlinearModelFit[sizedistcum, RRmodel,  {{dm, 10}, {n, 1}}, x];

{{dm, n}} = {dm, n} /. nlm[{"BestFitParameters"}]

{{dmdev, ndev}} = nlm[{"ParameterErrors"}]

RRrsquared = nlm[{"RSquared"}]    

nlm[{"ParameterTable"}]
Attachments:
POSTED BY: Gustavo Dacanal
9 Replies

As a side note, the excessive memory usage in the original code can be avoided by using instead

RRmodel = 1. - Exp[-(x/dm)^n];

that is, replacing the exact number 1 with a machine precision 1.

Whenever a tiny number is subtracted from an exact 1, the result will have too many digits of precision, consider for example

1 - Exp[-100000.]

Adding a few more zeros to the Exp argument can make any computer run out of memory.

This behavior may also be circumvented by telling the kernel to no longer trap machine underflow and thus avoid switching to high-precision arithmetic:

In[1]:= SetSystemOptions["CatchMachineUnderflow" -> False]; 

In[2]:= 1 - Exp[-100000.]  

Out[2]= 1.

In fact, the above may become the default (and only) behavior for machine arithmetic in a future release.

POSTED BY: Ilian Gachevski

Thank you, it is working!

POSTED BY: Gustavo Dacanal
Posted 8 years ago

If the raw data isn't available, then one can still obtain maximum likelihood estimates from the binned data (assuming the binned data can be transformed into the observed frequency counts).

sizedistcum = {{1.12996, 0.}, {1.42366, 0.}, {1.7937, 0.}, {2.25992, 
    0.}, {2.84732, 0.}, {3.5874, 0.}, {4.51984, 0.}, {5.69464, 
    0.}, {7.1748, 0.}, {9.03968, 0.}, {11.3893, 0.}, {14.3496, 
    0.}, {18.0794, 0.}, {22.7786, 0.}, {28.6992, 0.}, {36.1587, 
    0.}, {45.5572, 0.}, {57.3984, 0.00215924}, {72.3175, 
    0.0283401}, {91.1143, 0.0537112}, {114.797, 0.0753036}, {144.635, 
    0.103104}, {182.229, 0.268826}, {229.594, 0.832659}, {289.27, 
    0.964372}, {364.457, 0.993522}, {459.187, 1.}, {578.54, 
    1.}, {728.914, 1.}, {918.375, 1.}, {1157.08, 1.}, {1457.83, 
    1.}, {1836.75, 1.}, {2314.16, 1.}, {2915.66, 1.}, {3673.5, 
    1.}, {4628.32, 1.}, {5831.32, 1.}, {7347., 1.}, {9256.64, 
    1.}, {11662.6, 1.}};

(* Frequency counts for each bin *)
c = Differences[Round[3705 sizedistcum[[All, 2]]]]
(* Bin borders *)
x = N[PowerRange[1, 13000, 2^(1/3)]]

(* Cumulative distribution function *)
F[x_] := 1 - Exp[-(x/dm)^n]

(* Log of likelihood *)
logL = Sum[c[[i]] Log[F[x[[i + 1]]] - F[x[[i]]]], {i, Length[c]}];

$$ \log {L} = \sum_ {i = 1}^k c_i \log {(F (x_ {i + 1}) - F (x_i))}$$

(* Find maximum likelihood estimates *)
sol = FindMaximum[logL, {{n, 4}, {dm, 200}}, WorkingPrecision -> 20]
(* {-5891.8004314673819211, {n -> 4.3748213953340505200, dm -> 191.19015070319332793}} *)

(* Show results *)
Show[ContourPlot[logL, {n, 3, 6}, {dm, 150, 250}, PlotRange -> All,
  Contours -> -{6, 7, 8, 9, 10, 11, 12} 1000, 
  PlotLabel -> Style["Log likelihood surface", Black, 24],
  Frame -> True, 
  FrameLabel -> (Style[#, Bold, Italic, 18] &) /@ {"n", "dm"}],
 ListPlot[{{n, dm}} /. sol[[2]], PlotStyle -> Red]]

Log likelilhood surface with maximum likelihood estimate

(* Standard errors *)
cov = -Inverse[(D[logL, {{n, dm}, 2}]) /. sol[[2]]];
sen = cov[[1, 1]]^0.5
(* 0.05702 *)
sedm = cov[[2, 2]]^0.5
(* 0.786466 *)

To repeat: this only works if one can reconstruct the counts as the total count is the sample size rather than the number of bins (which is completely artificial).

POSTED BY: Jim Baldwin

Thank you Jim, your contributions about statistics are also welcome! The Mathematica message error was explained by Gachevski. It is related to number machine precision, replacing a number with number-dot: 1 -> 1.

POSTED BY: Gustavo Dacanal

Thank you for the detailed description about the aplication of "WeibullDistribution" and raw data. The Rosin-Rammler model is used to describe Particle-size distributions. In fact, the raw data containing the size of individual particles could be obtained from certain experimental measurements, eg: image analysis.
In other hand, particle size distributions that were obtained from mesh sieving and laser difraction will show the size distribution described as "histogram binning". In this last case, the model fitting is the most used tool.

I would like to hear another opinion for this statistical atrocity. I'll remember that my question is about the "memory error" and freezing of Mathematica kernel.

POSTED BY: Gustavo Dacanal
Posted 8 years ago

I owe you a more substantial demonstration as to why the "regression approach" should be avoided at all costs. Below are the results of a 100 simulations of a random sample of size 3705 (your original sample size) from a Rosin-Rammler distribution with parameters $n=4.5$ and $dm=200$. You'll see that the regression approach is extremely biased for estimating $dm$ and extremely variable for both $n$ and $dm$.

nsim = 100;  (* Number of simulations *)
mle = ConstantArray[{0, 0}, nsim];
reg = ConstantArray[{0, 0}, nsim];
mleBinned = ConstantArray[{0, 0}, nsim];
nSamples = 3705;
Do[
 x = RandomVariate[WeibullDistribution[4.5, 200], nSamples];
 bins = N[PowerRange[1, 14000, 2^(1/3)]];
 dpimean = ConstantArray[0, Length[bins] - 1];
 Do[dpimean[[i]] = (bins[[i]] + bins[[i + 1]])/2, {i, 
   Length[bins] - 1}];
 c = BinCounts[x, {bins}];
 sizedistcum = Transpose[{dpimean, 1. Accumulate[c]/nSamples}];

 (* Regression estimates of n and dm *)
 (* Select only the cumulative proportions strictly between 0 and 1 *)
 data = Select[sizedistcum, 0 < #[[2]] < 1 &];

 (* Transform the coordinates *)
 data[[All, 1]] = Log[data[[All, 1]]];
 data[[All, 2]] = Log[-Log[1 - data[[All, 2]]]];

 (* Fit a line corresponding to a distribution function and show results *)
 lm = LinearModelFit[data, z, z];

 (* Get regression estimates of n and dm *)
 {intercept, nreg} = lm["BestFitParameters"];
 dmreg = Exp[-intercept/nreg];
 reg[[i]] = {nreg, dmreg};

 (* Maximum likelihood estimates with raw data *)
 mle[[i]] = {n, dm} /. FindDistributionParameters[x, WeibullDistribution[n, dm]];

 (* Maximum likelihood estimates with binned data *)
 (* Cumulative distribution function *)
 n =.;
 dm =.;
 F[x_] := 1 - Exp[-(x/dm)^n];

 (* Log of likelihood *)
 logL = Sum[c[[j]] Log[F[bins[[j + 1]]] - F[bins[[j]]]], {j, Length[c]}];

 (* Find maximum likelihood estimates *)
 mleBinned[[i]] = {n, dm} /. (FindMaximum[{logL, n > 0 && dm > 0}, {{n, mle[[i, 1]]}, {dm, mle[[i, 2]]}}, WorkingPrecision -> 30][[2]]),

 {i, nsim}]

(* Show results *)
ListPlot[{mle, mleBinned, reg, {{4.5, 200}}}, Frame -> True, 
 FrameLabel -> (Style[#, Bold, 18, Black] &) /@ {"n", "dm"},
 PlotStyle -> {{PointSize[0.01], Blue}, {PointSize[0.01], 
    Green}, {PointSize[0.01], Orange}, {PointSize[0.025], Red}},
 PlotLegends ->  PointLegend[{Blue, Green, Orange, Red}, {"Maximum likelihood (raw data)", 
    "Maximum likelihood (binned data)",  "Regression on sample distribution function", "True values"},
   LegendMarkerSize -> 20], ImageSize -> 500, PlotRange -> All]

Results of estimating n and dm for 3 different estimation methods

POSTED BY: Jim Baldwin
Posted 8 years ago

I'm sorry I wasn't convincing in my attempt to keep you from completing a statistical atrocity. This time I'll suggest how the appropriate approach is much simpler:

First one should use the raw data and not do any histogram binning. Here a random sample is chosen from a Rosin-Rammler (Weibull) distribution and the FindDistributionParameters function is used to obtain the maximum likelihood parameter estimates:

SeedRandom[12345];
x = RandomVariate[WeibullDistribution[4.5, 200], 3705];
mle = FindDistributionParameters[x, WeibullDistribution[n, dm]]
(* {n -> 4.57036, dm -> 199.591} *)

That's all there is to it. Well, other than constructing a measure of precision for the estimates (which is essential). That only takes a few more steps. Below are the steps to obtain the standard errors for the parameter estimates:

logL = LogLikelihood[WeibullDistribution[n, dm], x];
cov = -Inverse[(D[logL, {{n, dm}, 2}]) /. mle];
sen = cov[[1, 1]]^0.5
(* 0.0584913 *)
sedm = cov[[2, 2]]^0.5
(* 0.755647 *)

Performing a regression is bad news because of the many assumptions that are violated: (1) This is not a regression situation but rather a situation fitting a probability distribution, (2) the assumption of constant variance that NonlinearModelFit assumes is violated, (3) In this case one has 3705 observations and not the artificial sample size of an arbitrary binning. And there are certainly other violations. (I will admit that maybe a regression approach could be used to get starting values for the maximum likelihood estimates. But that's all it should be used for.)

POSTED BY: Jim Baldwin

Dear Jim, Thank you for your contribution. Lets try to solve this error/freezing... I'll give more details about this simple code. It is a cumulative size distribution from fine powders. First, I performed the "BinCounts" from raw data. The "bin" stepsize follow a power range as bellow: where "dpi" is the particle diameter in micrometers

dpi = N[PowerRange[1, 14000, 2^(1/3)]]

Then, I obtained the mean dpi, which is used as x-axis data:

bindpi = Length[dpi] - 1;    
dpimean = ConstantArray[0, bindpi];   
Do[dpimean[[i]] = (dpi[[i]] + dpi[[i + 1]])/2, {i, 1, bindpi}]

x-axis data is:

  xdata={1.12996,1.42366,1.7937,2.25992,2.84732,3.5874,4.51984,5.69464,7.1748,9.03968,11.3893,14.3496,18.0794,22.7786,28.6992,36.1587,45.5572,57.3984,72.3175,91.1143,114.797,144.635,182.229,229.594,289.27,364.457,459.187,578.54,728.914,918.375,1157.08,1457.83,1836.75,2314.16,2915.66,3673.5,4628.32,5831.32,7347.,9256.64,11662.6}

y-axis data is (this is another data that is fitting correctly):

 ydata={0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.169972,0.705382,0.926346,0.988669,1.,1.,1.,1.,1.,1.,1.,1.,1.,1.}

Then, I have the X,Y fitting Array

sizedistcum = Transpose[{xdata, ydata}]

Finally, I used the Rosin-Rammler exponential function and the nonlinear optimization:

RRmodel = 1 - Exp[-(x/dm)^n];

nlm = NonlinearModelFit[sizedistcum, RRmodel,  {{dm, 10}, {n, 1}}, x];

{{dm, n}} = {dm, n} /. nlm[{"BestFitParameters"}]

{{dmdev, ndev}} = nlm[{"ParameterErrors"}]

RRrsquared = nlm[{"RSquared"}]

nlm[{"ParameterTable"}]

The "Memory Error" still occured with some y-data (not with all size distributions).. as example the following y-data showed this message error when try evaluated:

ydata={0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.00215924,0.0283401,0.0537112,0.0753036,0.103104,0.268826,0.832659,0.964372,0.993522,1.,1.,1.,1.,1.,1.,1.,1.,1.,1.,1.,1.,1.,1.,1.}

Thanks, Gustavo

POSTED BY: Gustavo Dacanal
Posted 8 years ago

Given that the dataset name (sizedistcum) has "dist" and "cum" in it and it appears that you have equal intervals with respect to the log of x, I suspect you have a sample cumulative distribution function formed from a binning of a random sample of size 3,705. The function that you're trying to fit is a cumulative distribution function for a Weibull distribution. If so, performing a regression is inappropriate.

The counts for the histogram seem to be

{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 97, 94, 80, 103, 614, 2089, 488, 108, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}

If so, I can show you how to estimate the parameters of the Weibull distribution from a histogram if my speculation about your data is correct. But first I want to determine if my speculation is correct.

Jim

POSTED BY: Jim Baldwin
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard