1
|
467 Views
|
5 Replies
|
4 Total Likes
View groups...
Share

# Unexpected parameter value from FindDistribution

Posted 1 month ago
 Hi,If we take a table Table1 ={0.0588235, 0.0491358, 0.0035403, 0.0432735, 0.026303, 0.0202264, 0.0448407, 0.00615208, 0.0519149, 0.00287303, 0.0466974, 0.0152923, 0.036656, 0.0224011, 0.0313708, 0.0256386, 0.0297211, 0.0256386, 0.0313708, 0.0224011, 0.036656, 0.0152923, 0.0466974, 0.00287303, 0.0519149, 0.00615208, 0.0448407, 0.0202264, 0.026303, 0.0432735, 0.0035403, 0.0491358, 0.0588235}.  Note that it is symmetric about 0.0297211. Applyin FindDistribution[Table1]  we get a uniform distribution UniformDistribution[{-0.079215, 0.143696}]  However, for Table 1, the parameter can not be negative. The correct parameter may be UniformDistribution[{0.00287303, 0.0588235}]  Please let me know how we can fix this error.
5 Replies
Sort By:
Posted 1 month ago
 Why would you think that a magic black box (FindDistribution) whose thought process you don't know is making an error? It doesn't know what you know (or think you know).In short, it's not an error and there is no right answer.
Posted 27 days ago
 The negative parameter in Uniform Distribution we get by using FindDistribution does not make sense for the data (Table 1). We have one more example with a negative parameter:
Posted 27 days ago
 Think about it a bit more. Yes, it isn’t the result that you expected. Note that your “data” is clearly contrived and not a random sample from any distribution. You’ve purposely made the “data” perfectly symmetric about the mean and all values but 1 occur twice. And it's possible that FindDistribution "notices" that there are ties and does something not expected.Consider the following: SeedRandom[12345]; x = RandomVariate[UniformDistribution[{0, 0.06}], 16]; MinMax[x] (* {0.00634282, 0.0474547} *) FindDistribution[x] (* UniformDistribution[{0.00634282, 0.0474547}] *) This is the result that you expect. But what happens when the data is artificially repeated as you have done? x = Join[x, x]; FindDistribution[x] (* UniformDistribution[{-0.0842559, 0.144423}] *) So it seems plausible that your duplicating data points is the cause of the issue. Somehow these multiple ties influence what FindDistribution ends up doing.In general why would you expect FindDistribution to think like you and then question it when it doesn’t do what you expected? As mentioned by others the documentation says FindDistribution uses a Bayesian approach and has priors on candidate distributions. So you just can’t expect it to think like you and know what you know about the data. It only has the available data you provide and those Bayesian priors (other than some thinning of the herd of candidate distributions by selecting continuous or not).One does not always need a parametric distribution. And choosing a parametric distribution when there is no theoretical or historical reason is wishful thinking (especially with a small sample size). A SmoothHistogram which gives a visual description might be all one needs.One should only use FindDistribution in desperation if at all.
Posted 27 days ago
 Think about it a bit more. Yes, it isn’t the result that you expected. Note that your “data” is clearly contrived and not a random sample from any distribution. You’ve purposely made the “data” perfectly symmetric about the mean.Why would you expect FindDistribution to think like you and question it when it doesn’t do what you expected? As mentioned by others the documentation says FindDistribution uses a Bayesian approach and has priors on candidate distributions. So you just can’t expect it to think like you and know what you know about the data. It only has the available data you provide and those Bayesian priors (other than some thinning of the herd of candidate distributions by selecting continuous or not). One does not always need a parametric distribution. And choosing a parametric distribution when there is no theoretical or historical reason is wishful thinking (especially with a small sample size). A SmoothHistogram which gives a visual description might be all one needs.One should only use FindDistribution in desperation if at all.
Posted 1 month ago
 Very interesting example! Firstly, FindDistributionParameters[table1, UniformDistribution[{mn, mx}]] produces correct answer: {mn -> 0.00287303, mx -> 0.0588235} From docs: By default, FindDistributionParameters uses maximum likelihood to estimate distribution parameters for a fixed distribution. FindDistribution uses a full Bayesian approach by combining the Bayesian information criterion with priors over distributions to select both the best distribution and the best parameters for it. If we don’t know exact form of the distribution, but know something, we can use the options for FindDistribution, e.g.: FindDistribution[table1, TargetFunctions -> "Continuous"] UniformDistribution[{0.00304037, 0.0590737}]