Message Boards Message Boards

Simple, fast compiled peak detection based on moving average

Posted 12 years ago
Recently Christopher coded a neat wavelet-based method for peak detection. Peak detection is often needed in various scientific fields: all sorts of spectral analysis, time series and general feature recognition in data come to mind right away.  There are many methods for peak detection. Besides his wavelet-based code Christopher also mentions a built-in MaxDetect function rooted in image processing. MaxDetect, though, being a rather elaborate tool for multi-dimensional data (2D, 3D images) and with a specific image-processing-minded parameter tuning, was not meant to target time series and other 1D data. This got me thinking.

Can we come up with a minimal compile-able peak detection code that would be accurate, robust and fast in most situations for 1D data?

I am not an expert in the subject, but intuitively peak detection consists of two stages. 
  • Finding all maxima. This can be done via Differences with selection of neighbor difference pairs that change from positive to negative. Such pair would indicate local maximum.
  • Filtering out peaks. Selecting those maxima that are “high” splashes of amplitude with respect to its “immediate neighborhood”. Quoted words are relative and depend on particular data. This is why they are set as tuning parameters in the algorithm.
To illustrate 2nd point lets take a look at the Albert Einstein Wikipedia page hits history below. Obviously a large peak in the past can be lower than current average in case if there is strong trend in the data. This is why we need “windowing” when looking for peaks, - to compare a peak to its immediate neighborhood. 



Without further ado here is a function written specifically in terms of functions that can be compiled. For example I do not use MovingAverage, but do trick with Partition instead.
PeakDetect = Compile[{{data, _Real, 1}, {width, _Integer}, {cut, _Real}}, (Table[0, {width}]~Join~
      Map[UnitStep[# - cut] &, data[[1 + width ;; -1 - width]] - Map[Mean, Partition[data, 1 + 2 width, 1]]]~Join~
      Table[0, {width}]) ({0}~Join~ Map[Piecewise[{{1, Sign[#] == {1, -1}}, {0, Sign[#] != {1, -1}}}] &,
       Partition[Differences[data], 2, 1]]~Join~{0}), CompilationTarget -> "C"];
The legend for the function arguments is the following:
  • data – 1D numerical list of data
  • width – half-width of the moving average window not including central point
  • cut – threshold at which to cut off the peak in natural units of data amplitudes 
Now let’s see some usage cases. Let’s import the same Albert Einstein data as a proof of concept.
raw = WolframAlpha[ "albert einstein", {{"PopularityPod:WikipediaStatsData", 1}, "TimeSeriesData"}];
data = raw[[All, 2]][[All, 1]];
We use total window width of 5 points here and cut off peak at a standard deviation of the whole data. The peak labeled “May 2008” is nicely picked up even though it is comparable then current average. This peak is most probably due to publication on May 13, 2008 of one of the most famous books about Einstein marked as New York Times bestseller that also got award Quill Award. Of course you can play with controls to pick or drop peaks. On the top plot one sees data, moving average, and bands formed by moving average displaced up and down by fraction of standard deviation. Any maximum above the top band becomes a peak.



The code for the app is at the very end. Lets try a different data set – recent sun spot activity.
raw = WolframAlpha["sun spot", {{"SunspotsPartialTimeSeries:SpaceWeatherData", 1}, "TimeSeriesData"}];
data = raw[[All, 2]];
We right away found on May 2013 mark - a most powerful recent event described in Wikipedia page here. Please let me know if you have suggestions how to speed this up or improve it generally. I would be very curious to know your opinion and critique. 

The .GIF below is large - wait till it is loaded.



The following reference could be useful:The code for the interactive app:
 Manipulate[
  tt = {#,
      Rotate[DateString[#, {"MonthNameShort", " ", "Year"}], Pi/2]} & /@
     Pick[raw, PeakDetect[data, wid, thr StandardDeviation[data]], 1][[
     All, 1]];
 
  Column[{
    
    ListLinePlot[{data,
     ArrayPad[MovingAverage[data, 1 + 2 wid], wid, "Fixed"],
     ArrayPad[MovingAverage[data, 1 + 2 wid], wid, "Fixed"] +
      thr StandardDeviation[data],
     ArrayPad[MovingAverage[data, 1 + 2 wid], wid, "Fixed"] -
      thr StandardDeviation[data]}, AspectRatio -> 1/6,
    ImageSize -> 800, Filling -> {2 -> {1}, 3 -> {4}},
    FrameTicks -> {None, Automatic},
    FillingStyle -> {Directive[Red, Opacity[.7]],
      Directive[Blue, Opacity[.7]], Directive[Gray, Opacity[.1]]},
    PlotStyle -> Opacity[.7], PlotRange -> All, Frame -> True,
    GridLines -> Automatic, PlotRangePadding -> 0],
   
   Show[
    DateListPlot[raw, Joined -> True, AspectRatio -> 1/6,
     ImageSize -> 800, Filling -> Bottom, Ticks -> {tt, Automatic},
     Frame -> False, Mesh -> All, PlotRange -> All],
    DateListPlot[
     If[# == {}, raw[[1 ;; 2]], #, #] &[
      Pick[raw, PeakDetect[data, wid, thr StandardDeviation[data]],
       1]], AspectRatio -> 1/6, ImageSize -> 800,
     PlotStyle -> Directive[Red, PointSize[.007]], PlotRange -> All]
    , PlotRangePadding -> {0, Automatic}]
   
   }],
Row[{
   Control[{{thr, 1, "threshold"}, 0, 2, Appearance -> "Labeled"}],
   Spacer[100],
   Control[{{wid, 3, "hal-width"}, 1, 10, 1, Setter}]
   }]
]

============== UPDATE =================

Thank you all very much for contributing. I collected everyone's efforts and Danny's two functions in a single completely compile-able expression which seems to give shortest time; - but just vaguely faster then Danny's ingenious maneuver. I very much liked format suggested by Christopher, the one that Michael also kept in his packages. But I wanted to make some benchmarking and thus followed the format returned by the function MaxDetect - simply for the sake of speed comparison. This format is just a binary list of the length of original data with 1s in positions of found peaks.

Here is the function:
 PeakDetect =
   Compile[{{data, _Real, 1}, {width, _Integer}, {cut, _Real}},
    (Table[0, {width}]~Join~
       UnitStep[
        Take[data, {1 + width, -1 - width}] -
           (Module[{tot = Total[#1[[1 ;; #2 - 1]]], last = 0.},
               Table[tot = tot + #1[[j + #2]] - last;
                last = #1[[j + 1]];
                tot, {j, 0, Length[#1] - #2}]]/#2) &[data, 1 + 2 width] - cut]
    ~Join~Table[0, {width}]) ({0}~Join~
      Table[If[Sign[{data[[ii + 1]] - data[[ii]],
           data[[ii + 2]] - data[[ii + 1]]}] == {1, -1}, 1, 0],
           {ii, 1, Length[data] - 2}]~Join~{0}), CompilationTarget -> "C"];
dat = RandomReal[1, 10^7];

pks = MaxDetect[dat]; // AbsoluteTiming
Total[pks]
(* ======== output ========
{62.807928, Null}
3333361
   ======== output ======== *)

pks = PeakDetect[dat, 1, 0]; // AbsoluteTiming
Total[pks]
(* ======== output ========
{1.560074, Null}
3333360
   ======== output ======== *)
And here are the speed benchmarks on 10 million data points which shows 40 times speed-up:
POSTED BY: Vitaliy Kaurov
20 Replies
Here are some optimizations to Vitaliy's Compile code. Replace
Map[UnitStep[# - cut] &,
data[[1 + width ;; -1 - width]] -
  Map[Mean, Partition[data, 1 + 2 width, 1]]]
with
UnitStep[data[[1 + width ;; -1 - width]] -
  Map[Mean, Partition[data, 1 + 2 width, 1]] - cut]
and
Map[Piecewise[{{1, Sign[#] == {1, -1}}, {0, Sign[#] != {1, -1}}}] &,
Partition[Differences[data], 2, 1]]
with
Map[Piecewise[{{1, # == {1, -1}}, {0, # != {1, -1}}}] &,
Sign[Partition[Differences[data], 2, 1]]]
A less effective optimization is to replace
Table[0, {width}]
with
ConstantArray[0, width]

So an optimized peak detection looks like


 Compile[{{data, _Real,
    1}, {width, _Integer}, {cut, _Real}}, (ConstantArray[0, width]~Join~
     UnitStep[
      data[[1 + width ;; -1 - width]] -
       Map[Mean, Partition[data, 1 + 2 width, 1]] - cut]~Join~
     ConstantArray[0, width]) *({0}~Join~
     Map[Piecewise[{{1, # == {1, -1}}, {0, # != {1, -1}}}] &,
      Sign[Partition[Differences[data], 2, 1]]]~Join~{0}),
  CompilationTarget -> "C"]

 For data of size 10000000 my version completes in 5.9653412 while the original completes in 8.3264762.

The basic idea is to move listable functions outside the map.

hope this helps.

-adk-
POSTED BY: Abdul Dakkak
Further optimization to `PeakDetect`.  `Take[list, {a,b}]` is always faster that `Part[list,a;;b]`. Instead of using `Piecewise` inside `Compile`, it is better to use `If`.

 PeakDetect3 =
  Compile[{{data, _Real,
     1}, {width, _Integer}, {cut, _Real}}, (Table[0, {width}]~Join~
      UnitStep[
       Take[data, {1 + width, -1 - width}] -
        Map[Mean, Partition[data, 1 + 2 width, 1]] - cut]~Join~
      Table[0, {width}])*({0}~Join~
      Map[If[# == {1, -1}, 1, 0] &,
       Sign[Partition[Differences[data], 2, 1]]]~Join~{0}),
  CompilationTarget -> "C"]

With `PeakDetect3`, I get further 25% performance increase over Abdul's enhanced code.
POSTED BY: Oleksandr Pavlyk
and this is quite a bit faster than Sasha's
 PeakDetect4 =
  Compile[{{data, _Real,
     1}, {width, _Integer}, {cut, _Real}}, (Table[0, {width}]~Join~
      UnitStep[
       Take[data, {1 + width, -1 - width}] -
        Map[Mean, Partition[data, 1 + 2 width, 1]] - cut]~Join~
      Table[0, {width}])*({0}~Join~
      Table[If[(data[[ii + 1]] - data[[ii]]) >
          0 && (data[[ii + 2]] - data[[ii + 1]]) < 0, 1, 0], {ii, 1,
       Length[data] - 2}]~Join~{0}), CompilationTarget -> "C"]


Also, I take back what I said about Table, ConstantArray is not Compiled. You  can see that by doing
<< CCodeGenerator`
CompilePrint[PeakDetect4]
POSTED BY: Abdul Dakkak
A bottleneck is the Partition, Total step, if width is larger than 1. One can instead code a moving average and run that through Compile.

movavg = Compile[{{data, _Real, 1}, {len, _Integer}}, Module[
        {tot = Total[data[[1 ;; len - 1]]], last = 0.},
        Table[tot = tot + data[[j + len]] - last; last = data[[j + 1]];
     tot, {j, 0, Length[data] - len}]
        ]/len]

Now define
 PeakDetect5 =
   Compile[{{data, _Real,
      1}, {width, _Integer}, {cut, _Real}}, (ConstantArray[0, width]~
       Join~UnitStep[
        Take[data, {1 + width, -1 - width}] -
         movavg[data, 1 + 2 width] - cut]~Join~
       ConstantArray[0, width])*({0}~Join~
       Table[If[
         Sign[{data[[ii + 1]] - data[ii],
           data[[ii + 2]] - data[[ii + 1]]}] == {1, -1}, 1, 0], {ii,
        1, Length[data] - 2}]~Join~{0}), CompilationTarget -> "C"}];
When width is 5 this is a factor of 2 or so faster than PeakDetect4.
POSTED BY: Daniel Lichtblau
Posted 12 years ago
Cool stuff! I added this functionality to the Wikicode library. Now you can just call Load["Digital filter"] and then use FindPeaks.
POSTED BY: Michael Hale
Very nice!

This will make it return x y coordinents:
 FindPeaks =
   Compile[{{data, _Real,
      1}, {width, _Integer}, {cut, _Real}}, ({#, data[[#]]} & /@
      Position[(Table[0, {width}]~Join~
           Map[UnitStep[# - cut] &,
            data[[1 + width ;; -1 - width]] -
             Map[Mean, Partition[data, 1 + 2 width, 1]]]~Join~
           Table[0, {width}]) ({0}~Join~
           Map[Piecewise[{{1, Sign[#] == {1, -1}}, {0,
               Sign[#] != {1, -1}}}] &,
           Partition[Differences[data], 2, 1]]~Join~{0}), 1][[All,
       1]]), CompilationTarget -> "C"];
Thank you all very much for contributing. I collected everyone's efforts and Danny's two functions in a single completely compile-able expression which seems to give shortest time; - but just vaguely faster then Danny's ingenious maneuver. I very much liked format suggested by Christopher, the one that Michael also kept in his packages. But I wanted to make some benchmarking and thus followed the format returned by the function MaxDetect - simply for the sake of speed comparison. This format is just a binary list of the length of original data with 1s in positions of found peaks.

Here is the function:
 FindPeaks =
   Compile[{{data, _Real, 1}, {width, _Integer}, {cut, _Real}},
    (Table[0, {width}]~Join~
       UnitStep[
        Take[data, {1 + width, -1 - width}] -
           (Module[{tot = Total[#1[[1 ;; #2 - 1]]], last = 0.},
               Table[tot = tot + #1[[j + #2]] - last;
                last = #1[[j + 1]];
                tot, {j, 0, Length[#1] - #2}]]/#2) &[data, 1 + 2 width] - cut]
    ~Join~Table[0, {width}]) ({0}~Join~
      Table[If[Sign[{data[[ii + 1]] - data[[ii]],
           data[[ii + 2]] - data[[ii + 1]]}] == {1, -1}, 1, 0],
           {ii, 1, Length[data] - 2}]~Join~{0}), CompilationTarget -> "C"];
And here are the speed benchmarks on 10 million data points which shows 40 times speed-up:
 dat = RandomReal[1, 10^7];
 
 pks = MaxDetect[dat]; // AbsoluteTiming
 Total[pks]
 (* ======== output ========
 {62.807928, Null}
 3333361
    ======== output ======== *)
 
pks = FindPeaks[dat, 1, 0]; // AbsoluteTiming
Total[pks]
(* ======== output ========
{1.560074, Null}
3333360
   ======== output ======== *)
POSTED BY: Vitaliy Kaurov
Posted 12 years ago
Thanks, Chris! That format does seem more consistent with the Mathematica standard library. I updated the package on Wikicode. Pardon the error if you tried to load it earlier. I updated the template to include rough syntax highlighting and Load was still looking for <nowiki> tags. Once I stabilize on a good workflow though I think these library extensions could be really useful. By the way, I saw your quadcopter demo on YouTube a few months ago. It was a very engaging demo. I'm looking forward to your next presentation.
POSTED BY: Michael Hale
Vitaliy, I do not know if ConstantArray is compilable I just cribbed that from the previous invocation of PeakDetect. But it is not really relevant. It only gets used to prepend and append a few zeros to the result, hence could be an external evaluation without having notable impact on run time. I see no discernable difference when I replace it with Table[0,{width}].
POSTED BY: Daniel Lichtblau
LICEcap - it is a free easy to use tool. A great way to make your community posts nice ;-)
POSTED BY: Vitaliy Kaurov
Maybe I am a little late to join the discussion but here I have another couple of optimisation tricks.
The function can be essentially divided in three pieces:
  1. finding points above a certain threshold;
  2. finding the maxima of the second derivative;
  3. multiplying the above two.
The last part is not particularly computational intensive. Let's analyse the first two.
To start with, for the sake of comparison I had packaged the two parts in separate compiled functions
 dat = RandomReal[1, 10^7];
 greater[dat, 2, 1]; // AbsoluteTiming // First
 max[dat]; // AbsoluteTiming // First
 %% + %
 FindPeaks[dat, 1, 0]; // AbsoluteTiming // First
 
 0.666499
 1.395815
 2.06231
2.074282
We immediately notice that the slowest step is #2. Let's try to optimise a bit
max2 = Compile[{{data, _Real, 1}},
   Join[{0}, 1 - UnitStep[Differences@Sign@Differences[data]], {0}],
   CompilationTarget -> "C"];

max2[dat]; // AbsoluteTiming // First
0.593889
This is a 2X improvement.

Now what can we do with #1? I could not find a faster compiled code but still we should not underestimate Mathematica function that are already compiled
ListConvolve[Table[1, {1 + 2 width}]/(1 + 2 width), data]
performs as well as the compiled variant.

Since the final time is roughly 1/3 time(#1) + 2/3 time(#2) a 2X speed-up in #2 translates in a 1.5X overall speed-up 
First@AbsoluteTiming@FindPeaks[dat, 2, 1]/
First@AbsoluteTiming@FindPeaks2[dat, 2, 1]

1.60147

A final note about returning the position of the peaks. Position seems to be faster than Select
First@AbsoluteTiming@Position[peaks, 1]
2.182735

First@AbsoluteTiming@Select[Range[Length[peaks]] peaks, Positive]
2.874397
First@AbsoluteTiming@Flatten[Position[peaks, 1]]
6.432645
but all the advantage is lost if we ask for a flattened version
Hello,
I think something in the Alpha query may have changed since Vitaliy's post.

I can get Vitaliy's nifty Manipulate to work, but I have to remove the Quantity(s) from the raw data:
raw = WolframAlpha[
  "albert einstein", {{"PopularityPod:WikipediaStatsData", 1},
   "TimeSeriesData"}]
data = raw[[All, 2]][[All, 1]];
raw = raw /. Quantity[a_, __] :> a;

Perhaps, in the original example the Alpha query was not using Quantity?
POSTED BY: W. Craig Carter
Daniel, that puzzles me: isn't ConstantArray un-compile-able ?
POSTED BY: Vitaliy Kaurov
Posted 12 years ago
Awesome! I updated Wikicode to use Daniel's latest version. For 10 million items I see a 66% improvement over Abdul's previous improvement. Formatting according to Chris's suggestions loses 5%, but that is probably the most common next operation people will perform. I also added a reference in the package to this discussion.
POSTED BY: Michael Hale
Michael,

Please note that I later corrected a transcription error:
Sign[{data[[ii + 1]] - data[]

should be
Sign[{data[[ii + 1]] - data[]

I also reported this as a cut-and-paste bug. I hope it was the only error introduced when I copied from Mathematica to the forum.

Another remark is that one could replace movavg with the Mathematica function MovingAverage. As best I can tell, Compile is fine with that, so I'm guessing it is a function that, for the unweighted case at least, supports calls from compiled functions.
POSTED BY: Daniel Lichtblau
Posted 12 years ago
Yes, I changed data[] to data[ [ i i ] ]. I saw Compile was falling back to standard evaluation for MovingAverage so I put yours as MovingAverage`CompiledMovingAverage on Wikicode.
http://en.wikipedia.org/wiki/User:Wakebrdkid/Digital_filter
http://en.wikipedia.org/wiki/User:Wakebrdkid/Moving_average
POSTED BY: Michael Hale
Posted 12 years ago
Vitaly, what tool did you use to create the GIF file?
POSTED BY: Diego Zviovich
Giulio, what are the definitions of greater and max? And could you please post the final version of the whole function - I got a bit lost in the logic.
POSTED BY: Sam Carrettie
Sorry for the delay. This is the complete function:
 Compile[{{data, _Real, 1}, {width, _Integer}, {cut, _Real}},
  Join[Join[Table[0, {width}],
     UnitStep[(Take[data, {1 + width, -1 - width}] - (1/#2)*
            Module[{tot = Total[#1[[1 ;; #2 - 1]]], last = 0.},
             Table[tot = tot + #1[[j + #2]] - last;
              last = #1[[j + 1]];
              tot, {j, 0, Length[#1] - #2}]] &)[data, 1 + 2 width] -
       cut]], Table[0, {width}]] Join[{0},
    1 - UnitStep[Differences[Sign[Differences[data]]]], {0}],
Parallelization -> True]

greater and max are just the two pieces multiplied within Vitaly's function, which have been detached in order to measure theri individual efficency.

enter image description here -- you have earned Featured Contributor Badge enter image description here Your exceptional post has been selected for our editorial column Staff Picks http://wolfr.am/StaffPicks and Your Profile is now distinguished by a Featured Contributor Badge and is displayed on the Featured Contributor Board. Thank you!

POSTED BY: EDITORIAL BOARD
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract