Message Boards Message Boards

Compare histograms?

Hi everybody, I have two data set (for example):

data1=RandomVariate[NormalDistribution[0,1],500];
data2=RandomVariate[NormalDistribution[3,1/2],500];

Is it possible to determine the overlap ratio of data1 and data2 histograms on the same plot.?

any help would be appreciated.:)

POSTED BY: M.A. Ghorbani
7 Replies

Dear Claude,

Thank you very much for your help:)

POSTED BY: M.A. Ghorbani

From a naive viewpoint, your can merely examine the errrors:

err1 = Observeddata - Regression;
err2 = Observeddata - NeuralNet;
m = Map[Mean, {err1, err2}];
s = Map[StandardDeviation, {err1, err2}];
{a, b} = {Min@Flatten@{err1, err2}, Max@Flatten@{err1, err2}};
Plot[{PDF[SmoothKernelDistribution@err1, x], 
  PDF[SmoothKernelDistribution@err2, x]}, {x, a, b}, 
 PlotLegends -> Map[ToString, Transpose[{m, s}]]]

The conclusion is that regression estimation looks better (lower bias and SD). Neverthelless, the rigth way for model selection also takes into account the estimation process (AIC and BIC criterions, etc.)

POSTED BY: Claude Mante

Hi Jim,

For the following data, how can I determine the closest density curve (Regression or NeuralNet) to Observeddata's density curve?

Thank you


Observeddata = {5.3, 2.5, 5, 5.3, 5.8, 4, 3, 3, 3.3, 5.1, 2.7, 2, 3.5, 5.3, 5.6, 4.4, 4.6, 5.3, 4.4, 6.6, 6, 3.5, 4.8, 5.3, 5.3, 6.6, 6, 3.6, 2.7, 4.4, 5.3, 5.3, 6, 4.4, 7.1, 3.5};

Regression = {5.8, 2.2, 6, 5.5, 5.4, 4.3, 4, 3, 4, 5.5, 3, 2, 4, 5.5, 6, 4, 5, 5.3, 4.8, 6, 6.5, 4.5, 5, 5.4, 5.1, 6.2, 6.5, 3.7, 3.1, 4.7, 5, 4.8, 5.7, 4.4, 7, 3.2}

NeuralNet = {6, 3, 5.1, 6, 5, 4, 4.7, 3.3, 4.1, 5, 3.2, 2.3, 4.7, 5, 6.5, 4.1, 5.5, 5.9, 4.7, 7, 6.8, 4.7, 4.9, 5.3, 5.2, 6.7, 6.1, 3.2, 3.9, 4.2, 5.4, 4.2, 5.1, 4.1, 6.7, 3};

{xmin, xmax} = MinMax[Flatten[{Observeddata, Regression}]];

skd1 = SmoothKernelDistribution[Observeddata];

skd2 = SmoothKernelDistribution[Regression];

skd3 = SmoothKernelDistribution[NeuralNet];

Plot[{PDF[skd1, x], PDF[skd2, x], PDF[skd3, x]}, {x, xmin, xmax}];

Attachments:
POSTED BY: M.A. Ghorbani
Posted 7 years ago

You're asking a different question and that should be in a new post. Also, I think your new question is more about statistics (i.e., what process should be used) than how to implement that process (i.e., how to code the process in Mathematica). You might want to try posting your question at CrossValidated StackExchange. On that forum I'm sure you'll be asked about how the data was generated for all 3 datasets along with "Why compare the distributions when comparing the individual pairs of values might be more appropriate?"

POSTED BY: Jim Baldwin
Posted 7 years ago

If one has a large number of sample points as you do, you're almost always better off using a nonparametric density estimate for the two distributions. Such estimates allow for a much more interpretable figure that two overlaid histograms:

data1 = RandomVariate[NormalDistribution[0, 1], 500];
data2 = RandomVariate[NormalDistribution[3, 1/2], 500];
{xmin, xmax} = MinMax[Flatten[{data1, data2}]];
skd1 = SmoothKernelDistribution[data1];
skd2 = SmoothKernelDistribution[data2];
Plot[{PDF[skd1, x], PDF[skd2, x]}, {x, xmin, xmax}]

Two nonparametric densities

As far as an "overlap ratio", you'll need to define what you mean by that.

POSTED BY: Jim Baldwin

Hi Jim,

For the following data, how can I determine the closest density curve (Regression or NeuralNet) to Observeddata's density curve?

Thank you


Observeddata = {5.3, 2.5, 5, 5.3, 5.8, 4, 3, 3, 3.3, 5.1, 2.7, 2, 3.5, 5.3, 5.6, 4.4, 4.6, 5.3, 4.4, 6.6, 6, 3.5, 4.8, 5.3, 5.3, 6.6, 6, 3.6, 2.7, 4.4, 5.3, 5.3, 6, 4.4, 7.1, 3.5};

Regression = {5.8, 2.2, 6, 5.5, 5.4, 4.3, 4, 3, 4, 5.5, 3, 2, 4, 5.5, 6, 4, 5, 5.3, 4.8, 6, 6.5, 4.5, 5, 5.4, 5.1, 6.2, 6.5, 3.7, 3.1, 4.7, 5, 4.8, 5.7, 4.4, 7, 3.2}

NeuralNet = {6, 3, 5.1, 6, 5, 4, 4.7, 3.3, 4.1, 5, 3.2, 2.3, 4.7, 5, 6.5, 4.1, 5.5, 5.9, 4.7, 7, 6.8, 4.7, 4.9, 5.3, 5.2, 6.7, 6.1, 3.2, 3.9, 4.2, 5.4, 4.2, 5.1, 4.1, 6.7, 3};

{xmin, xmax} = MinMax[Flatten[{Observeddata, Regression}]];

skd1 = SmoothKernelDistribution[Observeddata];

skd2 = SmoothKernelDistribution[Regression];

skd3 = SmoothKernelDistribution[NeuralNet];

Plot[{PDF[skd1, x], PDF[skd2, x], PDF[skd3, x]}, {x, xmin, xmax}];

Attachments:
POSTED BY: M.A. Ghorbani
Histogram[{data1,data2}]

?

POSTED BY: Sander Huisman
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract