Message Boards Message Boards

1
|
824 Views
|
5 Replies
|
4 Total Likes
View groups...
Share
Share this post:

Unexpected parameter value from FindDistribution

Posted 2 months ago

Hi,
If we take a table

Table1 ={0.0588235, 0.0491358, 0.0035403, 0.0432735, 0.026303, 0.0202264, 
  0.0448407, 0.00615208, 0.0519149, 0.00287303, 0.0466974, 0.0152923, 
  0.036656, 0.0224011, 0.0313708, 0.0256386, 0.0297211, 0.0256386,  
  0.0313708, 0.0224011, 0.036656, 0.0152923, 0.0466974, 0.00287303,  
  0.0519149, 0.00615208, 0.0448407, 0.0202264, 0.026303, 0.0432735, 
  0.0035403, 0.0491358, 0.0588235}.

Note that it is symmetric about 0.0297211. Applyin

FindDistribution[Table1]

we get a uniform distribution

UniformDistribution[{-0.079215, 0.143696}]

However, for Table 1, the parameter can not be negative. The correct parameter may be

UniformDistribution[{0.00287303, 0.0588235}]

Please let me know how we can fix this error.

POSTED BY: Rishi Kumar
5 Replies
Posted 2 months ago

Why would you think that a magic black box (FindDistribution) whose thought process you don't know is making an error? It doesn't know what you know (or think you know).

In short, it's not an error and there is no right answer.

POSTED BY: Jim Baldwin
Posted 2 months ago

The negative parameter in Uniform Distribution we get by using FindDistribution does not make sense for the data (Table 1). We have one more example with a negative parameter:

POSTED BY: Rishi Kumar
Posted 2 months ago

Think about it a bit more. Yes, it isn’t the result that you expected. Note that your “data” is clearly contrived and not a random sample from any distribution. You’ve purposely made the “data” perfectly symmetric about the mean and all values but 1 occur twice. And it's possible that FindDistribution "notices" that there are ties and does something not expected.

Consider the following:

SeedRandom[12345];
x = RandomVariate[UniformDistribution[{0, 0.06}], 16];
MinMax[x]
(* {0.00634282, 0.0474547} *)
FindDistribution[x]
(* UniformDistribution[{0.00634282, 0.0474547}] *)

This is the result that you expect. But what happens when the data is artificially repeated as you have done?

x = Join[x, x];
FindDistribution[x]
(* UniformDistribution[{-0.0842559, 0.144423}] *)

So it seems plausible that your duplicating data points is the cause of the issue. Somehow these multiple ties influence what FindDistribution ends up doing.

In general why would you expect FindDistribution to think like you and then question it when it doesn’t do what you expected? As mentioned by others the documentation says FindDistribution uses a Bayesian approach and has priors on candidate distributions. So you just can’t expect it to think like you and know what you know about the data. It only has the available data you provide and those Bayesian priors (other than some thinning of the herd of candidate distributions by selecting continuous or not).

One does not always need a parametric distribution. And choosing a parametric distribution when there is no theoretical or historical reason is wishful thinking (especially with a small sample size). A SmoothHistogram which gives a visual description might be all one needs.

One should only use FindDistribution in desperation if at all.

POSTED BY: Jim Baldwin
Posted 2 months ago

Think about it a bit more. Yes, it isn’t the result that you expected. Note that your “data” is clearly contrived and not a random sample from any distribution. You’ve purposely made the “data” perfectly symmetric about the mean.

Why would you expect FindDistribution to think like you and question it when it doesn’t do what you expected? As mentioned by others the documentation says FindDistribution uses a Bayesian approach and has priors on candidate distributions. So you just can’t expect it to think like you and know what you know about the data. It only has the available data you provide and those Bayesian priors (other than some thinning of the herd of candidate distributions by selecting continuous or not). One does not always need a parametric distribution. And choosing a parametric distribution when there is no theoretical or historical reason is wishful thinking (especially with a small sample size). A SmoothHistogram which gives a visual description might be all one needs.

One should only use FindDistribution in desperation if at all.

POSTED BY: Jim Baldwin
Posted 2 months ago

Very interesting example! Firstly,

FindDistributionParameters[table1, UniformDistribution[{mn, mx}]]

produces correct answer:

{mn -> 0.00287303, mx -> 0.0588235}

From docs:

By default, FindDistributionParameters uses maximum likelihood to estimate distribution parameters for a fixed distribution. FindDistribution uses a full Bayesian approach by combining the Bayesian information criterion with priors over distributions to select both the best distribution and the best parameters for it.

If we don’t know exact form of the distribution, but know something, we can use the options for FindDistribution, e.g.:

FindDistribution[table1, TargetFunctions -> "Continuous"]

UniformDistribution[{0.00304037, 0.0590737}]
POSTED BY: Denis Ivanov
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract