Group Abstract Group Abstract

Message Boards Message Boards

Compare histograms?

Hi everybody, I have two data set (for example):

data1=RandomVariate[NormalDistribution[0,1],500];
data2=RandomVariate[NormalDistribution[3,1/2],500];

Is it possible to determine the overlap ratio of data1 and data2 histograms on the same plot.?

any help would be appreciated.:)

POSTED BY: M.A. Ghorbani
7 Replies

Dear Claude,

Thank you very much for your help:)

POSTED BY: M.A. Ghorbani

From a naive viewpoint, your can merely examine the errrors:

err1 = Observeddata - Regression;
err2 = Observeddata - NeuralNet;
m = Map[Mean, {err1, err2}];
s = Map[StandardDeviation, {err1, err2}];
{a, b} = {Min@Flatten@{err1, err2}, Max@Flatten@{err1, err2}};
Plot[{PDF[SmoothKernelDistribution@err1, x], 
  PDF[SmoothKernelDistribution@err2, x]}, {x, a, b}, 
 PlotLegends -> Map[ToString, Transpose[{m, s}]]]

The conclusion is that regression estimation looks better (lower bias and SD). Neverthelless, the rigth way for model selection also takes into account the estimation process (AIC and BIC criterions, etc.)

POSTED BY: Claude Mante

Hi Jim,

For the following data, how can I determine the closest density curve (Regression or NeuralNet) to Observeddata's density curve?

Thank you


Observeddata = {5.3, 2.5, 5, 5.3, 5.8, 4, 3, 3, 3.3, 5.1, 2.7, 2, 3.5, 5.3, 5.6, 4.4, 4.6, 5.3, 4.4, 6.6, 6, 3.5, 4.8, 5.3, 5.3, 6.6, 6, 3.6, 2.7, 4.4, 5.3, 5.3, 6, 4.4, 7.1, 3.5};

Regression = {5.8, 2.2, 6, 5.5, 5.4, 4.3, 4, 3, 4, 5.5, 3, 2, 4, 5.5, 6, 4, 5, 5.3, 4.8, 6, 6.5, 4.5, 5, 5.4, 5.1, 6.2, 6.5, 3.7, 3.1, 4.7, 5, 4.8, 5.7, 4.4, 7, 3.2}

NeuralNet = {6, 3, 5.1, 6, 5, 4, 4.7, 3.3, 4.1, 5, 3.2, 2.3, 4.7, 5, 6.5, 4.1, 5.5, 5.9, 4.7, 7, 6.8, 4.7, 4.9, 5.3, 5.2, 6.7, 6.1, 3.2, 3.9, 4.2, 5.4, 4.2, 5.1, 4.1, 6.7, 3};

{xmin, xmax} = MinMax[Flatten[{Observeddata, Regression}]];

skd1 = SmoothKernelDistribution[Observeddata];

skd2 = SmoothKernelDistribution[Regression];

skd3 = SmoothKernelDistribution[NeuralNet];

Plot[{PDF[skd1, x], PDF[skd2, x], PDF[skd3, x]}, {x, xmin, xmax}];

Attachments:
POSTED BY: M.A. Ghorbani
Posted 10 years ago
POSTED BY: Jim Baldwin
Posted 10 years ago

If one has a large number of sample points as you do, you're almost always better off using a nonparametric density estimate for the two distributions. Such estimates allow for a much more interpretable figure that two overlaid histograms:

data1 = RandomVariate[NormalDistribution[0, 1], 500];
data2 = RandomVariate[NormalDistribution[3, 1/2], 500];
{xmin, xmax} = MinMax[Flatten[{data1, data2}]];
skd1 = SmoothKernelDistribution[data1];
skd2 = SmoothKernelDistribution[data2];
Plot[{PDF[skd1, x], PDF[skd2, x]}, {x, xmin, xmax}]

Two nonparametric densities

As far as an "overlap ratio", you'll need to define what you mean by that.

POSTED BY: Jim Baldwin

Hi Jim,

For the following data, how can I determine the closest density curve (Regression or NeuralNet) to Observeddata's density curve?

Thank you


Observeddata = {5.3, 2.5, 5, 5.3, 5.8, 4, 3, 3, 3.3, 5.1, 2.7, 2, 3.5, 5.3, 5.6, 4.4, 4.6, 5.3, 4.4, 6.6, 6, 3.5, 4.8, 5.3, 5.3, 6.6, 6, 3.6, 2.7, 4.4, 5.3, 5.3, 6, 4.4, 7.1, 3.5};

Regression = {5.8, 2.2, 6, 5.5, 5.4, 4.3, 4, 3, 4, 5.5, 3, 2, 4, 5.5, 6, 4, 5, 5.3, 4.8, 6, 6.5, 4.5, 5, 5.4, 5.1, 6.2, 6.5, 3.7, 3.1, 4.7, 5, 4.8, 5.7, 4.4, 7, 3.2}

NeuralNet = {6, 3, 5.1, 6, 5, 4, 4.7, 3.3, 4.1, 5, 3.2, 2.3, 4.7, 5, 6.5, 4.1, 5.5, 5.9, 4.7, 7, 6.8, 4.7, 4.9, 5.3, 5.2, 6.7, 6.1, 3.2, 3.9, 4.2, 5.4, 4.2, 5.1, 4.1, 6.7, 3};

{xmin, xmax} = MinMax[Flatten[{Observeddata, Regression}]];

skd1 = SmoothKernelDistribution[Observeddata];

skd2 = SmoothKernelDistribution[Regression];

skd3 = SmoothKernelDistribution[NeuralNet];

Plot[{PDF[skd1, x], PDF[skd2, x], PDF[skd3, x]}, {x, xmin, xmax}];

Attachments:
POSTED BY: M.A. Ghorbani
Histogram[{data1,data2}]

?

POSTED BY: Sander Huisman
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard