Group Abstract Group Abstract

Message Boards Message Boards

0
|
12.7K Views
|
9 Replies
|
8 Total Likes
View groups...
Share
Share this post:

Histogram and HistogramList do not honor the requested number of bins

Posted 11 years ago

Hi list. I'm new here. Medium skills with Mathematica.

Histogram and HistogramList both have allowable forms such as this:

HistogramList[some_list, bspec]

where bspec can be the number of bins. The actual number of bins used seems to have a very casual relationship to the number of bins requested. For example, the following bit of code

numBins = 1;
uniformSamples = RandomReal[{-Pi/2, Pi/2}, 1000];
list = HistogramList[uniformSamples, numBins];
{numBins, Length[list[[1]]] - 1}

returns the following numbers for {number of bins requested, number of bins used} for a few choices of numBins. Note that there are one fewer bins than bin delimiters (edges), thus Length[list[[1]]] - 1.

{1, 1}
{2, 2}
{3, 2}
{4, 4}
{5, 4}
{6, 8}
{7, 8}
{8, 8}
{9, 8}
{10, 16}
{11, 16}
{16, 16}
{17, 16}
{21, 16}
{22, 32}

It seems that there is something that is overriding the input request and instead returning bin numbers that are powers of two.

How can I get the number of bins that I request without ginning up my own list of bin edges? And who knows if that would even work. I can't see this behavior in the documentation for these functions which states unambiguously, "The following bin specifications bpsec can be given: n use n bins"

Jerry

POSTED BY: Jerry
9 Replies

Alternatively, one uses HistogramList and then just uses ListPlot with InterpolationOrder -> 1 or 2 and it looks fine.

POSTED BY: Sander Huisman
Posted 4 years ago

The arbitrary choice is a good case for not doing histograms in the 21st century. Try SmoothHistogram instead. (Not to mention that no roughly continuous distribution looks as blocky as a histogram even with large amounts of data and small bin widths.)

POSTED BY: Jim Baldwin

Thanks for your feedback. Yes, i can imagine that in reality, real-life problems, the borders are not hit exactly at x. My book problems are with discrete data, absolute frequencies, and i can see a notable difference in the normal distribution approximations.

I am going to post my treatment of the problem in a new thread because it should be very interesting and instructive for first-time learners of the topic (statistics probabilities).

POSTED BY: Raspi Rascal
POSTED BY: Sander Huisman
POSTED BY: Raspi Rascal
Posted 11 years ago
Attachments:
POSTED BY: Jerry

Are you saying that for the case of ProbabilityDensity the integration of the results would be not equal to 1?? That would be a big problem! Please show us an example!

POSTED BY: Sander Huisman
Posted 11 years ago
POSTED BY: Jerry

Hi Jerry,

What is not written there is that it chooses a 'nice' subdivisions around the range of your numbers. If you look at the boundaries of the bins it will be 'nice' numbers. If your domain equals -pi to pi there are no 'nice' subdivisions. It probably uses something like FindDivisions to find 'nice' bin-boundaries.

What I always use is the {xmin,xmax,dx} specification. That case, I know exactly where the bin starts and how many I will get. Even if n would work as you would expect, the start point is still 'random'. i.e. giving a specification of 'n' does not uniquely define the divisions. {xmin,xmax,dx} defines it uniquely.

POSTED BY: Sander Huisman
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard