2
|
7262 Views
|
2 Replies
|
3 Total Likes
View groups...
Share
GROUPS:

# What if BinCounts is not fast enough?

Posted 10 years ago
 What if BinCounts is not fast enough?  It looks like BinCounts could be optimized a bit more, at least in the specific situation when the bin boundaries are fixed and the bin widths are equal.In[1]:= d = RandomReal[1, 5*^6];In[2]:= BinCounts[d, {0, 1, 0.1}] // AbsoluteTimingOut[2]= {4.816101, {500339, 500068, 500609, 500186, 499183, 498943, 499839, 500473, 500514, 499846}}HistogramList performs differently (faster or slower ... )Here's a trick to do this an order of magnitude faster:In[3]:= SortBy[Tally@Floor[d, 0.1], First][[All, 2]] // AbsoluteTimingOut[3]= {0.586790, {500339, 500068, 500609, 500186, 499183, 498943, 499839, 500473, 500514, 499846}}The performance difference depends on the bin size (as Tally slows down when it needs to count more types of items).I hope that this will be useful for someone.  Comments are welcome, especially since the Tally version still isn't fast enough for my needs.
2 Replies
Sort By:
Posted 10 years ago
 Interesting that Floor is a lot faster if it is rounding down to an integerIn[8]:= Sort[Tally@Floor[d, 0.1]] // AbsoluteTimingOut[8]= {0.954055, {{0., 500731}, {0.1, 499693}, {0.2, 499671}, {0.3,    499778}, {0.4, 500568}, {0.5, 500277}, {0.6, 498844}, {0.7,    498883}, {0.8, 501158}, {0.9, 500397}}}In[17]:= Sort[Tally@Floor[10 d]] // AbsoluteTimingOut[17]= {0.147008, {{0, 500731}, {1, 499693}, {2, 499671}, {3,    499778}, {4, 500568}, {5, 500277}, {6, 498844}, {7, 498883}, {8,    501158}, {9, 500397}}}
Posted 10 years ago
 Here's another order of magnitude, achieved by simply replacing Floor with Quotient, which will allow Tally to operate on an integer list. In[1]:= d = RandomReal[1, 5*^6];  In[2]:= BinCounts[d, {0, 1, 0.1}] // AbsoluteTiming Out[2]= {4.690474, {500670, 499971, 498900, 499459, 500789, 501261, 499914, 499137, 499619, 500280}}  In[3]:= SortBy[Tally@Floor[d, 0.1], First][[All, 2]] // AbsoluteTiming Out[3]= {0.717640, {500670, 499971, 498900, 499459, 500789, 501261, 499914, 499137, 499619, 500280}}  In[4]:= SortBy[Tally@Quotient[d, 0.1], First][[All, 2]] // AbsoluteTimingOut[4]= {0.053532, {500670, 499971, 498900, 499459, 500789, 501261, 499914, 499137, 499619, 500280}}Because of what Tally (probably) does internally, this is not the most algorithmically efficient version, but it's the fastest I could come up with in Mathematica. For few bins it's faster than what I managed with Compile.Anyway, just the fact that the compiled version is so much faster than BinCounts or HistogramList shows that BinCounts could (should?) be optimized more than it is currently, at least for packed arrays and equal width bins.  As a user I wouldn't want to rewrite these types of functions for speed.