# What if BinCounts is not fast enough?

Posted 7 years ago
5419 Views
|
2 Replies
|
3 Total Likes
|
 What if BinCounts is not fast enough?  It looks like BinCounts could be optimized a bit more, at least in the specific situation when the bin boundaries are fixed and the bin widths are equal.In[1]:= d = RandomReal[1, 5*^6];In[2]:= BinCounts[d, {0, 1, 0.1}] // AbsoluteTimingOut[2]= {4.816101, {500339, 500068, 500609, 500186, 499183, 498943, 499839, 500473, 500514, 499846}}HistogramList performs differently (faster or slower ... )Here's a trick to do this an order of magnitude faster:In[3]:= SortBy[Tally@Floor[d, 0.1], First][[All, 2]] // AbsoluteTimingOut[3]= {0.586790, {500339, 500068, 500609, 500186, 499183, 498943, 499839, 500473, 500514, 499846}}The performance difference depends on the bin size (as Tally slows down when it needs to count more types of items).I hope that this will be useful for someone.  Comments are welcome, especially since the Tally version still isn't fast enough for my needs.
2 Replies
Sort By:
Posted 7 years ago
 Here's another order of magnitude, achieved by simply replacing Floor with Quotient, which will allow Tally to operate on an integer list. In[1]:= d = RandomReal[1, 5*^6];  In[2]:= BinCounts[d, {0, 1, 0.1}] // AbsoluteTiming Out[2]= {4.690474, {500670, 499971, 498900, 499459, 500789, 501261, 499914, 499137, 499619, 500280}}  In[3]:= SortBy[Tally@Floor[d, 0.1], First][[All, 2]] // AbsoluteTiming Out[3]= {0.717640, {500670, 499971, 498900, 499459, 500789, 501261, 499914, 499137, 499619, 500280}}  In[4]:= SortBy[Tally@Quotient[d, 0.1], First][[All, 2]] // AbsoluteTimingOut[4]= {0.053532, {500670, 499971, 498900, 499459, 500789, 501261, 499914, 499137, 499619, 500280}}Because of what Tally (probably) does internally, this is not the most algorithmically efficient version, but it's the fastest I could come up with in Mathematica. For few bins it's faster than what I managed with Compile.Anyway, just the fact that the compiled version is so much faster than BinCounts or HistogramList shows that BinCounts could (should?) be optimized more than it is currently, at least for packed arrays and equal width bins.  As a user I wouldn't want to rewrite these types of functions for speed.