Group Abstract

Message Boards

WOLFRAM COMMUNITY

9.7K Views

5 Replies

1 Total Like

View groups...

Follow this post

Share this post:

GROUPS:

Data Science Physics

Interpretation of strong peak at mean in Gaussian/Poisson histogram

David Wood

David Wood, Academic institution

Posted 11 years ago

Howdy all! My first posting since comp.soft-sys.math.mathematica days! I recently acquired a data-logging Geiger counter and have been fitting the time-series data (total counts per 1 minute interval). Since the numbers well exceed 1, the Poisson distribution I expect will look quite Gaussian. I bin the count rates then fit the data. The distributions are as expected EXCEPT that I keep finding in finer-scale histograms of the counts-per-minute data a much-larger-than-expected narrow peak right at the peak of the histogram. (In graphic attached: red=Poisson, green=Gaussian.) Being a naive theoretical non-nuclear physicist, I'm puzzled. Can someone explain my data to me? :) Perhaps I am regressing to the normal (a pun, not an interpretation)? Thanks! DMW Attachments:

POSTED BY: David Wood

5 Replies

Sort By:

Jim Baldwin

Jim Baldwin, Retired

Posted 11 years ago

What really counts is that you're satisfied with the rationale you've provided. But you've peaked my interest (no pun intended). The spread in the histogram is certainly about what you'd expect from a Poisson with mean around 50: 95% of the counts would fall between 36 and 64. The 60-minute moving average figure looks like what one would expect from 3500 1-minute counts from a Poisson distribution. But samples from a Poisson distribution would not have such a peak in a histogram. The observed histogram is consistent with a random samples from a Poisson distribution but "contaminated" with a set of values very close to the mean. If you're willing to share one of the datasets (in the original time order) and/or your Mathematica code to get the histograms, I'd certainly like to see if I could see something in the data. (Daniel Lichtblau is absolutely correct: one can't say much without the raw data.) Also, it seems like your rationale could be validated with some additional lines of code.

POSTED BY: Jim Baldwin

Daniel Lichtblau

Daniel Lichtblau, Wolfram Research

Posted 11 years ago

It is possible that you just need narrower bins to see a tall thin peak. I confess the sides look a bit fat for that to happen though.

POSTED BY: Daniel Lichtblau

David Wood

David Wood, Academic institution

Posted 11 years ago

I figured it out (mostly). Suppose we have a Poisson (or other) process whose mean slowly drifts in time. (Slow: on a time scale much longer than the time between samples) by a amount small in comparison to the nominal mean. As a PERIODIC model: LocalMeanOft = NominMean + dM cos om t If I histogrammed the data accumulated for a time much longer than 2 Pi/om I would find a mean of NominMean. If I histogrammed the data accumulated over a time SHORT in comparison to 2 Pi/om, I would find a mean which depended on where in the cycle the data was collected. Thus (i) because dM << NominMean, the effect on the breadth of the distribution is small. (ii) the 'pile-up' of data points at the nominal mean simply reflects the fact that the local mean (in the periodic MODEL above) is as likely to be larger than the nominal mean as to be smaller. This means that if the TOTAL sampling time is short in comparison to 2 Pi/om, I will see a well-behaved Poisson histogram. The longer the sampling time, the more pronounced the `anomalous' peak in the histogram will be. Issue: because so many counts pile up at mean, it will somewhat distort a fits. So the real questions are (1) How to get the best signal/noise out of fits. This probably entails identifying the longest time for which the peak pile up is (almost) NOT present. This is probably encoded in the autocorrelation function. Perhaps moving averages are a better guide, to remove SHORT-duration (high frequency) noise. I probably need to (i) break up the data for the TOTAL sampling time into intervals in which rate is reasonably constant (ii) Fit these as independent sets, to max S/N ratio (minimize effect of pile-up of counts at mean) (2) Perhaps for each interval subtract fit from data, then examine the t-dependence of the difference. These issues must be completely understood by people who do time-series analysis. Can anyone point me to some references on this? Again, thanks Daniel. DMW

POSTED BY: David Wood

David Wood

David Wood, Academic institution

Posted 11 years ago

Thanks for your response, Daniel. Attached is what I hope is a salient graphical summary of the data (the log10 power spectrum of the DFT of moving average data may be spurious.) I believe the issue is one of physics or of some feature of statistics not known to me. The prominent peak at the mean count rate survives many binning choices. A radiation statistics or weak source astrophysicist might know? Attachments:

POSTED BY: David Wood

Daniel Lichtblau

Daniel Lichtblau, Wolfram Research

Posted 11 years ago

Hard to say much without the raw data. Possibly an accident of where the bin boundaries are drawn?

POSTED BY: Daniel Lichtblau

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback