Group Abstract Group Abstract

Message Boards Message Boards

0
|
13.7K Views
|
9 Replies
|
7 Total Likes
View groups...
Share
Share this post:

Avoid message error "MemoryAllocationFailure" with NonlinearModelFit?

This code has performed an exponential adjustment and gets the "ParameterErrors" and "RSquared" for a exponetial model and "NonlinearModelFit". The code is running correctly for almost data ... However, for this data, attached in the code, Mathematica is freezing during the evaluation of "ParameterErrors" and "RSquared" and "ParameterTable". Please. How can I evaluate this data without freezing and errors? SystemException ["MemoryAllocationFailure"]

ClearAll["Global`*"]
ClearSystemCache[]

sizedistcum = {{1.12996, 0.}, {1.42366, 0.}, {1.7937, 0.}, {2.25992, 
    0.}, {2.84732, 0.}, {3.5874, 0.}, {4.51984, 0.}, {5.69464, 
    0.}, {7.1748, 0.}, {9.03968, 0.}, {11.3893, 0.}, {14.3496, 
    0.}, {18.0794, 0.}, {22.7786, 0.}, {28.6992, 0.}, {36.1587, 
    0.}, {45.5572, 0.}, {57.3984, 0.00215924}, {72.3175, 
    0.0283401}, {91.1143, 0.0537112}, {114.797, 0.0753036}, {144.635, 
    0.103104}, {182.229, 0.268826}, {229.594, 0.832659}, {289.27, 
    0.964372}, {364.457, 0.993522}, {459.187, 1.}, {578.54, 
    1.}, {728.914, 1.}, {918.375, 1.}, {1157.08, 1.}, {1457.83, 
    1.}, {1836.75, 1.}, {2314.16, 1.}, {2915.66, 1.}, {3673.5, 
    1.}, {4628.32, 1.}, {5831.32, 1.}, {7347., 1.}, {9256.64, 
    1.}, {11662.6, 1.}};

RRmodel = 1 - Exp[-(x/dm)^n];

nlm = NonlinearModelFit[sizedistcum, RRmodel,  {{dm, 10}, {n, 1}}, x];

{{dm, n}} = {dm, n} /. nlm[{"BestFitParameters"}]

{{dmdev, ndev}} = nlm[{"ParameterErrors"}]

RRrsquared = nlm[{"RSquared"}]    

nlm[{"ParameterTable"}]
Attachments:
POSTED BY: Gustavo Dacanal
9 Replies

Thank you Jim, your contributions about statistics are also welcome! The Mathematica message error was explained by Gachevski. It is related to number machine precision, replacing a number with number-dot: 1 -> 1.

POSTED BY: Gustavo Dacanal

Thank you, it is working!

POSTED BY: Gustavo Dacanal
Posted 8 years ago

I owe you a more substantial demonstration as to why the "regression approach" should be avoided at all costs. Below are the results of a 100 simulations of a random sample of size 3705 (your original sample size) from a Rosin-Rammler distribution with parameters $n=4.5$ and $dm=200$. You'll see that the regression approach is extremely biased for estimating $dm$ and extremely variable for both $n$ and $dm$.

nsim = 100;  (* Number of simulations *)
mle = ConstantArray[{0, 0}, nsim];
reg = ConstantArray[{0, 0}, nsim];
mleBinned = ConstantArray[{0, 0}, nsim];
nSamples = 3705;
Do[
 x = RandomVariate[WeibullDistribution[4.5, 200], nSamples];
 bins = N[PowerRange[1, 14000, 2^(1/3)]];
 dpimean = ConstantArray[0, Length[bins] - 1];
 Do[dpimean[[i]] = (bins[[i]] + bins[[i + 1]])/2, {i, 
   Length[bins] - 1}];
 c = BinCounts[x, {bins}];
 sizedistcum = Transpose[{dpimean, 1. Accumulate[c]/nSamples}];

 (* Regression estimates of n and dm *)
 (* Select only the cumulative proportions strictly between 0 and 1 *)
 data = Select[sizedistcum, 0 < #[[2]] < 1 &];

 (* Transform the coordinates *)
 data[[All, 1]] = Log[data[[All, 1]]];
 data[[All, 2]] = Log[-Log[1 - data[[All, 2]]]];

 (* Fit a line corresponding to a distribution function and show results *)
 lm = LinearModelFit[data, z, z];

 (* Get regression estimates of n and dm *)
 {intercept, nreg} = lm["BestFitParameters"];
 dmreg = Exp[-intercept/nreg];
 reg[[i]] = {nreg, dmreg};

 (* Maximum likelihood estimates with raw data *)
 mle[[i]] = {n, dm} /. FindDistributionParameters[x, WeibullDistribution[n, dm]];

 (* Maximum likelihood estimates with binned data *)
 (* Cumulative distribution function *)
 n =.;
 dm =.;
 F[x_] := 1 - Exp[-(x/dm)^n];

 (* Log of likelihood *)
 logL = Sum[c[[j]] Log[F[bins[[j + 1]]] - F[bins[[j]]]], {j, Length[c]}];

 (* Find maximum likelihood estimates *)
 mleBinned[[i]] = {n, dm} /. (FindMaximum[{logL, n > 0 && dm > 0}, {{n, mle[[i, 1]]}, {dm, mle[[i, 2]]}}, WorkingPrecision -> 30][[2]]),

 {i, nsim}]

(* Show results *)
ListPlot[{mle, mleBinned, reg, {{4.5, 200}}}, Frame -> True, 
 FrameLabel -> (Style[#, Bold, 18, Black] &) /@ {"n", "dm"},
 PlotStyle -> {{PointSize[0.01], Blue}, {PointSize[0.01], 
    Green}, {PointSize[0.01], Orange}, {PointSize[0.025], Red}},
 PlotLegends ->  PointLegend[{Blue, Green, Orange, Red}, {"Maximum likelihood (raw data)", 
    "Maximum likelihood (binned data)",  "Regression on sample distribution function", "True values"},
   LegendMarkerSize -> 20], ImageSize -> 500, PlotRange -> All]

Results of estimating n and dm for 3 different estimation methods

POSTED BY: Jim Baldwin

As a side note, the excessive memory usage in the original code can be avoided by using instead

RRmodel = 1. - Exp[-(x/dm)^n];

that is, replacing the exact number 1 with a machine precision 1.

Whenever a tiny number is subtracted from an exact 1, the result will have too many digits of precision, consider for example

1 - Exp[-100000.]

Adding a few more zeros to the Exp argument can make any computer run out of memory.

This behavior may also be circumvented by telling the kernel to no longer trap machine underflow and thus avoid switching to high-precision arithmetic:

In[1]:= SetSystemOptions["CatchMachineUnderflow" -> False]; 

In[2]:= 1 - Exp[-100000.]  

Out[2]= 1.

In fact, the above may become the default (and only) behavior for machine arithmetic in a future release.

POSTED BY: Ilian Gachevski
Posted 8 years ago

If the raw data isn't available, then one can still obtain maximum likelihood estimates from the binned data (assuming the binned data can be transformed into the observed frequency counts).

sizedistcum = {{1.12996, 0.}, {1.42366, 0.}, {1.7937, 0.}, {2.25992, 
    0.}, {2.84732, 0.}, {3.5874, 0.}, {4.51984, 0.}, {5.69464, 
    0.}, {7.1748, 0.}, {9.03968, 0.}, {11.3893, 0.}, {14.3496, 
    0.}, {18.0794, 0.}, {22.7786, 0.}, {28.6992, 0.}, {36.1587, 
    0.}, {45.5572, 0.}, {57.3984, 0.00215924}, {72.3175, 
    0.0283401}, {91.1143, 0.0537112}, {114.797, 0.0753036}, {144.635, 
    0.103104}, {182.229, 0.268826}, {229.594, 0.832659}, {289.27, 
    0.964372}, {364.457, 0.993522}, {459.187, 1.}, {578.54, 
    1.}, {728.914, 1.}, {918.375, 1.}, {1157.08, 1.}, {1457.83, 
    1.}, {1836.75, 1.}, {2314.16, 1.}, {2915.66, 1.}, {3673.5, 
    1.}, {4628.32, 1.}, {5831.32, 1.}, {7347., 1.}, {9256.64, 
    1.}, {11662.6, 1.}};

(* Frequency counts for each bin *)
c = Differences[Round[3705 sizedistcum[[All, 2]]]]
(* Bin borders *)
x = N[PowerRange[1, 13000, 2^(1/3)]]

(* Cumulative distribution function *)
F[x_] := 1 - Exp[-(x/dm)^n]

(* Log of likelihood *)
logL = Sum[c[[i]] Log[F[x[[i + 1]]] - F[x[[i]]]], {i, Length[c]}];

$$ \log {L} = \sum_ {i = 1}^k c_i \log {(F (x_ {i + 1}) - F (x_i))}$$

(* Find maximum likelihood estimates *)
sol = FindMaximum[logL, {{n, 4}, {dm, 200}}, WorkingPrecision -> 20]
(* {-5891.8004314673819211, {n -> 4.3748213953340505200, dm -> 191.19015070319332793}} *)

(* Show results *)
Show[ContourPlot[logL, {n, 3, 6}, {dm, 150, 250}, PlotRange -> All,
  Contours -> -{6, 7, 8, 9, 10, 11, 12} 1000, 
  PlotLabel -> Style["Log likelihood surface", Black, 24],
  Frame -> True, 
  FrameLabel -> (Style[#, Bold, Italic, 18] &) /@ {"n", "dm"}],
 ListPlot[{{n, dm}} /. sol[[2]], PlotStyle -> Red]]

Log likelilhood surface with maximum likelihood estimate

(* Standard errors *)
cov = -Inverse[(D[logL, {{n, dm}, 2}]) /. sol[[2]]];
sen = cov[[1, 1]]^0.5
(* 0.05702 *)
sedm = cov[[2, 2]]^0.5
(* 0.786466 *)

To repeat: this only works if one can reconstruct the counts as the total count is the sample size rather than the number of bins (which is completely artificial).

POSTED BY: Jim Baldwin
POSTED BY: Gustavo Dacanal
Posted 8 years ago
POSTED BY: Jim Baldwin

Dear Jim, Thank you for your contribution. Lets try to solve this error/freezing... I'll give more details about this simple code. It is a cumulative size distribution from fine powders. First, I performed the "BinCounts" from raw data. The "bin" stepsize follow a power range as bellow: where "dpi" is the particle diameter in micrometers

dpi = N[PowerRange[1, 14000, 2^(1/3)]]

Then, I obtained the mean dpi, which is used as x-axis data:

bindpi = Length[dpi] - 1;    
dpimean = ConstantArray[0, bindpi];   
Do[dpimean[[i]] = (dpi[[i]] + dpi[[i + 1]])/2, {i, 1, bindpi}]

x-axis data is:

  xdata={1.12996,1.42366,1.7937,2.25992,2.84732,3.5874,4.51984,5.69464,7.1748,9.03968,11.3893,14.3496,18.0794,22.7786,28.6992,36.1587,45.5572,57.3984,72.3175,91.1143,114.797,144.635,182.229,229.594,289.27,364.457,459.187,578.54,728.914,918.375,1157.08,1457.83,1836.75,2314.16,2915.66,3673.5,4628.32,5831.32,7347.,9256.64,11662.6}

y-axis data is (this is another data that is fitting correctly):

 ydata={0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.169972,0.705382,0.926346,0.988669,1.,1.,1.,1.,1.,1.,1.,1.,1.,1.}

Then, I have the X,Y fitting Array

sizedistcum = Transpose[{xdata, ydata}]

Finally, I used the Rosin-Rammler exponential function and the nonlinear optimization:

RRmodel = 1 - Exp[-(x/dm)^n];

nlm = NonlinearModelFit[sizedistcum, RRmodel,  {{dm, 10}, {n, 1}}, x];

{{dm, n}} = {dm, n} /. nlm[{"BestFitParameters"}]

{{dmdev, ndev}} = nlm[{"ParameterErrors"}]

RRrsquared = nlm[{"RSquared"}]

nlm[{"ParameterTable"}]

The "Memory Error" still occured with some y-data (not with all size distributions).. as example the following y-data showed this message error when try evaluated:

ydata={0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.00215924,0.0283401,0.0537112,0.0753036,0.103104,0.268826,0.832659,0.964372,0.993522,1.,1.,1.,1.,1.,1.,1.,1.,1.,1.,1.,1.,1.,1.,1.}

Thanks, Gustavo

POSTED BY: Gustavo Dacanal
Posted 8 years ago

Given that the dataset name (sizedistcum) has "dist" and "cum" in it and it appears that you have equal intervals with respect to the log of x, I suspect you have a sample cumulative distribution function formed from a binning of a random sample of size 3,705. The function that you're trying to fit is a cumulative distribution function for a Weibull distribution. If so, performing a regression is inappropriate.

The counts for the histogram seem to be

{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 97, 94, 80, 103, 614, 2089, 488, 108, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}

If so, I can show you how to estimate the parameters of the Weibull distribution from a histogram if my speculation about your data is correct. But first I want to determine if my speculation is correct.

Jim

POSTED BY: Jim Baldwin
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard