Group Abstract

Message Boards

WOLFRAM COMMUNITY

82.2K Views

20 Replies

37 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Staff Picks Data Science Signal Processing Mathematica Mathematica Add-Ons Compilation High-Performance Computing Wolfram Language Statistics and Probability

Simple, fast compiled peak detection based on moving average

Vitaliy Kaurov

Vitaliy Kaurov, WOLFRAM Research

Posted 12 years ago

Recently Christopher coded a neat wavelet-based method for peak detection. Peak detection is often needed in various scientific fields: all sorts of spectral analysis, time series and general feature recognition in data come to mind right away. There are many methods for peak detection. Besides his wavelet-based code Christopher also mentions a built-in MaxDetect function rooted in image processing. MaxDetect, though, being a rather elaborate tool for multi-dimensional data (2D, 3D images) and with a specific image-processing-minded parameter tuning, was not meant to target time series and other 1D data. This got me thinking. Can we come up with a minimal compile-able peak detection code that would be accurate, robust and fast in most situations for 1D data? I am not an expert in the subject, but intuitively peak detection consists of two stages. Finding all maxima. This can be done via Differences with selection of neighbor difference pairs that change from positive to negative. Such pair would indicate local maximum. Filtering out peaks. Selecting those maxima that are high splashes of amplitude with respect to its immediate neighborhood. Quoted words are relative and depend on particular data. This is why they are set as tuning parameters in the algorithm. To illustrate 2nd point lets take a look at the Albert Einstein Wikipedia page hits history below. Obviously a large peak in the past can be lower than current average in case if there is strong trend in the data. This is why we need windowing when looking for peaks, - to compare a peak to its immediate neighborhood. Without further ado here is a function written specifically in terms of functions that can be compiled. For example I do not use MovingAverage, but do trick with Partition instead. PeakDetect = Compile[{{data, _Real, 1}, {width, _Integer}, {cut, _Real}}, (Table[0, {width}]~Join~ Map[UnitStep[# - cut] &, data[[1 + width ;; -1 - width]] - Map[Mean, Partition[data, 1 + 2 width, 1]]]~Join~ Table[0, {width}]) ({0}~Join~ Map[Piecewise[{{1, Sign[#] == {1, -1}}, {0, Sign[#] != {1, -1}}}] &, Partition[Differences[data], 2, 1]]~Join~{0}), CompilationTarget -> "C"]; The legend for the function arguments is the following: data 1D numerical list of data width half-width of the moving average window not including central point cut threshold at which to cut off the peak in natural units of data amplitudes Now lets see some usage cases. Lets import the same Albert Einstein data as a proof of concept. raw = WolframAlpha[ "albert einstein", {{"PopularityPod:WikipediaStatsData", 1}, "TimeSeriesData"}]; data = raw[[All, 2]][[All, 1]]; We use total window width of 5 points here and cut off peak at a standard deviation of the whole data. The peak labeled May 2008 is nicely picked up even though it is comparable then current average. This peak is most probably due to publication on May 13, 2008 of one of the most famous books about Einstein marked as New York Times bestseller that also got award Quill Award. Of course you can play with controls to pick or drop peaks. On the top plot one sees data, moving average, and bands formed by moving average displaced up and down by fraction of standard deviation. Any maximum above the top band becomes a peak. The code for the app is at the very end. Lets try a different data set recent sun spot activity. raw = WolframAlpha["sun spot", {{"SunspotsPartialTimeSeries:SpaceWeatherData", 1}, "TimeSeriesData"}]; data = raw[[All, 2]]; We right away found on May 2013 mark - a most powerful recent event described in Wikipedia page here. Please let me know if you have suggestions how to speed this up or improve it generally. I would be very curious to know your opinion and critique. The .GIF below is large - wait till it is loaded. The following reference could be useful: Simple Algorithms for Peak Detection in Time-Series An Efficient Algorithm for Automatic Peak Detection in Noisy Periodic and Quasi-Periodic Signals The code for the interactive app: Manipulate[ tt = {#, Rotate[DateString[#, {"MonthNameShort", " ", "Year"}], Pi/2]} & /@ Pick[raw, PeakDetect[data, wid, thr StandardDeviation[data]], 1][[ All, 1]]; Column[{ ListLinePlot[{data, ArrayPad[MovingAverage[data, 1 + 2 wid], wid, "Fixed"], ArrayPad[MovingAverage[data, 1 + 2 wid], wid, "Fixed"] + thr StandardDeviation[data], ArrayPad[MovingAverage[data, 1 + 2 wid], wid, "Fixed"] - thr StandardDeviation[data]}, AspectRatio -> 1/6, ImageSize -> 800, Filling -> {2 -> {1}, 3 -> {4}}, FrameTicks -> {None, Automatic}, FillingStyle -> {Directive[Red, Opacity[.7]], Directive[Blue, Opacity[.7]], Directive[Gray, Opacity[.1]]}, PlotStyle -> Opacity[.7], PlotRange -> All, Frame -> True, GridLines -> Automatic, PlotRangePadding -> 0], Show[ DateListPlot[raw, Joined -> True, AspectRatio -> 1/6, ImageSize -> 800, Filling -> Bottom, Ticks -> {tt, Automatic}, Frame -> False, Mesh -> All, PlotRange -> All], DateListPlot[ If[# == {}, raw[[1 ;; 2]], #, #] &[ Pick[raw, PeakDetect[data, wid, thr StandardDeviation[data]], 1]], AspectRatio -> 1/6, ImageSize -> 800, PlotStyle -> Directive[Red, PointSize[.007]], PlotRange -> All] , PlotRangePadding -> {0, Automatic}] }], Row[{ Control[{{thr, 1, "threshold"}, 0, 2, Appearance -> "Labeled"}], Spacer[100], Control[{{wid, 3, "hal-width"}, 1, 10, 1, Setter}] }] ] ============== UPDATE ================= Thank you all very much for contributing. I collected everyone's efforts and Danny's two functions in a single completely compile-able expression which seems to give shortest time; - but just vaguely faster then Danny's ingenious maneuver. I very much liked format suggested by Christopher, the one that Michael also kept in his packages. But I wanted to make some benchmarking and thus followed the format returned by the function MaxDetect - simply for the sake of speed comparison. This format is just a binary list of the length of original data with 1s in positions of found peaks. Here is the function: PeakDetect = Compile[{{data, _Real, 1}, {width, _Integer}, {cut, _Real}}, (Table[0, {width}]~Join~ UnitStep[ Take[data, {1 + width, -1 - width}] - (Module[{tot = Total[#1[[1 ;; #2 - 1]]], last = 0.}, Table[tot = tot + #1[[j + #2]] - last; last = #1[[j + 1]]; tot, {j, 0, Length[#1] - #2}]]/#2) &[data, 1 + 2 width] - cut] ~Join~Table[0, {width}]) ({0}~Join~ Table[If[Sign[{data[[ii + 1]] - data[[ii]], data[[ii + 2]] - data[[ii + 1]]}] == {1, -1}, 1, 0], {ii, 1, Length[data] - 2}]~Join~{0}), CompilationTarget -> "C"]; dat = RandomReal[1, 10^7]; pks = MaxDetect[dat]; // AbsoluteTiming Total[pks] (* ======== output ======== {62.807928, Null} 3333361 ======== output ======== ) pks = PeakDetect[dat, 1, 0]; // AbsoluteTiming Total[pks] ( ======== output ======== {1.560074, Null} 3333360 ======== output ======== *) And here are the speed benchmarks on 10 million data points which shows 40 times speed-up:

POSTED BY: Vitaliy Kaurov

20 Replies

Sort By:

EDITORIAL BOARD

EDITORIAL BOARD, WOLFRAM

Posted 3 years ago

-- you have earned *Featured Contributor Badge* Your exceptional post has been selected for our editorial column *Staff Picks* http://wolfr.am/StaffPicks and Your Profile is now distinguished by a *Featured Contributor Badge* and is displayed on the Featured Contributor Board. Thank you!

POSTED BY: EDITORIAL BOARD

W. Craig Carter

W. Craig Carter, MIT

Posted 12 years ago

Hello, I think something in the Alpha query may have changed since Vitaliy's post. I can get Vitaliy's nifty Manipulate to work, but I have to remove the Quantity(s) from the raw data: raw = WolframAlpha[ "albert einstein", {{"PopularityPod:WikipediaStatsData", 1}, "TimeSeriesData"}] data = raw[[All, 2]][[All, 1]]; raw = raw /. Quantity[a_, __] :> a; Perhaps, in the original example the Alpha query was not using Quantity?

POSTED BY: W. Craig Carter

Giulio Alessandrini

Giulio Alessandrini, Wolfram Research Inc.

Posted 12 years ago

Sorry for the delay. This is the complete function: Compile[{{data, _Real, 1}, {width, _Integer}, {cut, _Real}}, Join[Join[Table[0, {width}], UnitStep[(Take[data, {1 + width, -1 - width}] - (1/#2)* Module[{tot = Total[#1[[1 ;; #2 - 1]]], last = 0.}, Table[tot = tot + #1[[j + #2]] - last; last = #1[[j + 1]]; tot, {j, 0, Length[#1] - #2}]] &)[data, 1 + 2 width] - cut]], Table[0, {width}]] Join[{0}, 1 - UnitStep[Differences[Sign[Differences[data]]]], {0}], Parallelization -> True] greater and max are just the two pieces multiplied within Vitaly's function, which have been detached in order to measure theri individual efficency.

POSTED BY: Giulio Alessandrini

Sam Carrettie

Sam Carrettie, Freelancer

Posted 12 years ago

Giulio, what are the definitions of greater and max? And could you please post the final version of the whole function - I got a bit lost in the logic.

POSTED BY: Sam Carrettie

Giulio Alessandrini

Giulio Alessandrini, Wolfram Research Inc.

Posted 12 years ago

Maybe I am a little late to join the discussion but here I have another couple of optimisation tricks. The function can be essentially divided in three pieces: finding points above a certain threshold; finding the maxima of the second derivative; multiplying the above two. The last part is not particularly computational intensive. Let's analyse the first two. To start with, for the sake of comparison I had packaged the two parts in separate compiled functions dat = RandomReal[1, 10^7]; greater[dat, 2, 1]; // AbsoluteTiming // First max[dat]; // AbsoluteTiming // First %% + % FindPeaks[dat, 1, 0]; // AbsoluteTiming // First 0.666499 1.395815 2.06231 2.074282 We immediately notice that the slowest step is #2. Let's try to optimise a bit max2 = Compile[{{data, _Real, 1}}, Join[{0}, 1 - UnitStep[Differences@Sign@Differences[data]], {0}], CompilationTarget -> "C"]; max2[dat]; // AbsoluteTiming // First 0.593889 This is a 2X improvement. Now what can we do with #1? I could not find a faster compiled code but still we should not underestimate Mathematica function that are already compiled ListConvolve[Table[1, {1 + 2 width}]/(1 + 2 width), data] performs as well as the compiled variant. Since the final time is roughly 1/3 time(#1) + 2/3 time(#2) a 2X speed-up in #2 translates in a 1.5X overall speed-up First@AbsoluteTiming@FindPeaks[dat, 2, 1]/ First@AbsoluteTiming@FindPeaks2[dat, 2, 1] 1.60147 A final note about returning the position of the peaks. Position seems to be faster than Select First@AbsoluteTiming@Position[peaks, 1] 2.182735 First@AbsoluteTiming@Select[Range[Length[peaks]] peaks, Positive] 2.874397 First@AbsoluteTiming@Flatten[Position[peaks, 1]] 6.432645 but all the advantage is lost if we ask for a flattened version

POSTED BY: Giulio Alessandrini

Vitaliy Kaurov

Vitaliy Kaurov, WOLFRAM Research

Posted 12 years ago

Thank you all very much for contributing. I collected everyone's efforts and Danny's two functions in a single completely compile-able expression which seems to give shortest time; - but just vaguely faster then Danny's ingenious maneuver. I very much liked format suggested by Christopher, the one that Michael also kept in his packages. But I wanted to make some benchmarking and thus followed the format returned by the function MaxDetect - simply for the sake of speed comparison. This format is just a binary list of the length of original data with 1s in positions of found peaks. Here is the function: FindPeaks = Compile[{{data, _Real, 1}, {width, _Integer}, {cut, _Real}}, (Table[0, {width}]~Join~ UnitStep[ Take[data, {1 + width, -1 - width}] - (Module[{tot = Total[#1[[1 ;; #2 - 1]]], last = 0.}, Table[tot = tot + #1[[j + #2]] - last; last = #1[[j + 1]]; tot, {j, 0, Length[#1] - #2}]]/#2) &[data, 1 + 2 width] - cut] ~Join~Table[0, {width}]) ({0}~Join~ Table[If[Sign[{data[[ii + 1]] - data[[ii]], data[[ii + 2]] - data[[ii + 1]]}] == {1, -1}, 1, 0], {ii, 1, Length[data] - 2}]~Join~{0}), CompilationTarget -> "C"]; And here are the speed benchmarks on 10 million data points which shows 40 times speed-up: dat = RandomReal[1, 10^7]; pks = MaxDetect[dat]; // AbsoluteTiming Total[pks] (* ======== output ======== {62.807928, Null} 3333361 ======== output ======== ) pks = FindPeaks[dat, 1, 0]; // AbsoluteTiming Total[pks] ( ======== output ======== {1.560074, Null} 3333360 ======== output ======== *)

POSTED BY: Vitaliy Kaurov

Diego Zviovich

Posted 12 years ago

Vitaly, what tool did you use to create the GIF file?

POSTED BY: Diego Zviovich

Vitaliy Kaurov

Vitaliy Kaurov, WOLFRAM Research

Posted 12 years ago

LICEcap - it is a free easy to use tool. A great way to make your community posts nice ;-)

POSTED BY: Vitaliy Kaurov

Michael Hale

Posted 12 years ago

Yes, I changed data[] to data[ [ i i ] ]. I saw Compile was falling back to standard evaluation for MovingAverage so I put yours as MovingAverage`CompiledMovingAverage on Wikicode. http://en.wikipedia.org/wiki/User:Wakebrdkid/Digital_filter http://en.wikipedia.org/wiki/User:Wakebrdkid/Moving_average

POSTED BY: Michael Hale

Daniel Lichtblau

Daniel Lichtblau, Wolfram Research

Posted 12 years ago

Michael, Please note that I later corrected a transcription error: Sign[{data[[ii + 1]] - data[] should be Sign[{data[[ii + 1]] - data[] I also reported this as a cut-and-paste bug. I hope it was the only error introduced when I copied from Mathematica to the forum. Another remark is that one could replace movavg with the Mathematica function MovingAverage. As best I can tell, Compile is fine with that, so I'm guessing it is a function that, for the unweighted case at least, supports calls from compiled functions.

POSTED BY: Daniel Lichtblau

Michael Hale

Posted 12 years ago

Awesome! I updated Wikicode to use Daniel's latest version. For 10 million items I see a 66% improvement over Abdul's previous improvement. Formatting according to Chris's suggestions loses 5%, but that is probably the most common next operation people will perform. I also added a reference in the package to this discussion.

POSTED BY: Michael Hale

Daniel Lichtblau

Daniel Lichtblau, Wolfram Research

Posted 12 years ago

Vitaliy, I do not know if ConstantArray is compilable I just cribbed that from the previous invocation of PeakDetect. But it is not really relevant. It only gets used to prepend and append a few zeros to the result, hence could be an external evaluation without having notable impact on run time. I see no discernable difference when I replace it with Table[0,{width}].

POSTED BY: Daniel Lichtblau

Daniel Lichtblau

Daniel Lichtblau, Wolfram Research

Posted 12 years ago

A bottleneck is the Partition, Total step, if width is larger than 1. One can instead code a moving average and run that through Compile. movavg = Compile[{{data, _Real, 1}, {len, _Integer}}, Module[ {tot = Total[data[[1 ;; len - 1]]], last = 0.}, Table[tot = tot + data[[j + len]] - last; last = data[[j + 1]]; tot, {j, 0, Length[data] - len}] ]/len] Now define PeakDetect5 = Compile[{{data, _Real, 1}, {width, _Integer}, {cut, _Real}}, (ConstantArray[0, width]~ Join~UnitStep[ Take[data, {1 + width, -1 - width}] - movavg[data, 1 + 2 width] - cut]~Join~ ConstantArray[0, width])*({0}~Join~ Table[If[ Sign[{data[[ii + 1]] - data[ii], data[[ii + 2]] - data[[ii + 1]]}] == {1, -1}, 1, 0], {ii, 1, Length[data] - 2}]~Join~{0}), CompilationTarget -> "C"}]; When width is 5 this is a factor of 2 or so faster than PeakDetect4.

POSTED BY: Daniel Lichtblau

Vitaliy Kaurov

Vitaliy Kaurov, WOLFRAM Research

Posted 12 years ago

Daniel, that puzzles me: isn't ConstantArray un-compile-able ?

POSTED BY: Vitaliy Kaurov

Abdul Dakkak

Abdul Dakkak, Wolfram Research

Posted 12 years ago

and this is quite a bit faster than Sasha's PeakDetect4 = Compile[{{data, _Real, 1}, {width, _Integer}, {cut, _Real}}, (Table[0, {width}]~Join~ UnitStep[ Take[data, {1 + width, -1 - width}] - Map[Mean, Partition[data, 1 + 2 width, 1]] - cut]~Join~ Table[0, {width}])*({0}~Join~ Table[If[(data[[ii + 1]] - data[[ii]]) > 0 && (data[[ii + 2]] - data[[ii + 1]]) < 0, 1, 0], {ii, 1, Length[data] - 2}]~Join~{0}), CompilationTarget -> "C"] Also, I take back what I said about Table, ConstantArray is not Compiled. You can see that by doing << CCodeGenerator` CompilePrint[PeakDetect4]

POSTED BY: Abdul Dakkak

Oleksandr Pavlyk

Oleksandr Pavlyk, Wolfram Research

Posted 12 years ago

Further optimization to `PeakDetect`. `Take[list, {a,b}]` is always faster that `Part[list,a;;b]`. Instead of using `Piecewise` inside `Compile`, it is better to use `If`. PeakDetect3 = Compile[{{data, _Real, 1}, {width, _Integer}, {cut, _Real}}, (Table[0, {width}]~Join~ UnitStep[ Take[data, {1 + width, -1 - width}] - Map[Mean, Partition[data, 1 + 2 width, 1]] - cut]~Join~ Table[0, {width}])*({0}~Join~ Map[If[# == {1, -1}, 1, 0] &, Sign[Partition[Differences[data], 2, 1]]]~Join~{0}), CompilationTarget -> "C"] With `PeakDetect3`, I get further 25% performance increase over Abdul's enhanced code.

POSTED BY: Oleksandr Pavlyk

Abdul Dakkak

Abdul Dakkak, Wolfram Research

Posted 12 years ago

Here are some optimizations to Vitaliy's Compile code. Replace Map[UnitStep[# - cut] &, data[[1 + width ;; -1 - width]] - Map[Mean, Partition[data, 1 + 2 width, 1]]] with UnitStep[data[[1 + width ;; -1 - width]] - Map[Mean, Partition[data, 1 + 2 width, 1]] - cut] and Map[Piecewise[{{1, Sign[#] == {1, -1}}, {0, Sign[#] != {1, -1}}}] &, Partition[Differences[data], 2, 1]] with Map[Piecewise[{{1, # == {1, -1}}, {0, # != {1, -1}}}] &, Sign[Partition[Differences[data], 2, 1]]] A less effective optimization is to replace Table[0, {width}] with ConstantArray[0, width] So an optimized peak detection looks like Compile[{{data, _Real, 1}, {width, _Integer}, {cut, _Real}}, (ConstantArray[0, width]~Join~ UnitStep[ data[[1 + width ;; -1 - width]] - Map[Mean, Partition[data, 1 + 2 width, 1]] - cut]~Join~ ConstantArray[0, width]) *({0}~Join~ Map[Piecewise[{{1, # == {1, -1}}, {0, # != {1, -1}}}] &, Sign[Partition[Differences[data], 2, 1]]]~Join~{0}), CompilationTarget -> "C"] For data of size 10000000 my version completes in 5.9653412 while the original completes in 8.3264762. The basic idea is to move listable functions outside the map. hope this helps. -adk-

POSTED BY: Abdul Dakkak

Michael Hale

Posted 12 years ago

Thanks, Chris! That format does seem more consistent with the Mathematica standard library. I updated the package on Wikicode. Pardon the error if you tried to load it earlier. I updated the template to include rough syntax highlighting and Load was still looking for <nowiki> tags. Once I stabilize on a good workflow though I think these library extensions could be really useful. By the way, I saw your quadcopter demo on YouTube a few months ago. It was a very engaging demo. I'm looking forward to your next presentation.

POSTED BY: Michael Hale

Christopher Wolfram

Christopher Wolfram, WOLFRAM

Posted 12 years ago

Very nice! This will make it return x y coordinents: FindPeaks = Compile[{{data, _Real, 1}, {width, _Integer}, {cut, _Real}}, ({#, data[[#]]} & /@ Position[(Table[0, {width}]~Join~ Map[UnitStep[# - cut] &, data[[1 + width ;; -1 - width]] - Map[Mean, Partition[data, 1 + 2 width, 1]]]~Join~ Table[0, {width}]) ({0}~Join~ Map[Piecewise[{{1, Sign[#] == {1, -1}}, {0, Sign[#] != {1, -1}}}] &, Partition[Differences[data], 2, 1]]~Join~{0}), 1][[All, 1]]), CompilationTarget -> "C"];

POSTED BY: Christopher Wolfram

Michael Hale

Posted 12 years ago

Cool stuff! I added this functionality to the Wikicode library. Now you can just call Load["Digital filter"] and then use FindPeaks.

POSTED BY: Michael Hale

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback