Message Boards Message Boards

GROUPS:

What if BinCounts is not fast enough?

Posted 7 years ago
5419 Views
|
2 Replies
|
3 Total Likes
|
What if BinCounts is not fast enough?  It looks like BinCounts could be optimized a bit more, at least in the specific situation when the bin boundaries are fixed and the bin widths are equal.
In[1]:= d = RandomReal[1, 5*^6];

In[2]:= BinCounts[d, {0, 1, 0.1}] // AbsoluteTiming
Out[2]= {4.816101, {500339, 500068, 500609, 500186, 499183, 498943, 499839, 500473, 500514, 499846}}

HistogramList performs differently (faster or slower ... )

Here's a trick to do this an order of magnitude faster:
In[3]:= SortBy[Tally@Floor[d, 0.1], First][[All, 2]] // AbsoluteTiming

Out[3]= {0.586790, {500339, 500068, 500609, 500186, 499183, 498943, 499839, 500473, 500514, 499846}}

The performance difference depends on the bin size (as Tally slows down when it needs to count more types of items).

I hope that this will be useful for someone.  Comments are welcome, especially since the Tally version still isn't fast enough for my needs.
2 Replies
Here's another order of magnitude, achieved by simply replacing Floor with Quotient, which will allow Tally to operate on an integer list.
 In[1]:= d = RandomReal[1, 5*^6];
 
 In[2]:= BinCounts[d, {0, 1, 0.1}] // AbsoluteTiming
 Out[2]= {4.690474, {500670, 499971, 498900, 499459, 500789, 501261, 499914, 499137, 499619, 500280}}
 
 In[3]:= SortBy[Tally@Floor[d, 0.1], First][[All, 2]] // AbsoluteTiming
 Out[3]= {0.717640, {500670, 499971, 498900, 499459, 500789, 501261, 499914, 499137, 499619, 500280}}
 
 In[4]:= SortBy[Tally@Quotient[d, 0.1], First][[All, 2]] // AbsoluteTiming
Out[4]= {0.053532, {500670, 499971, 498900, 499459, 500789, 501261, 499914, 499137, 499619, 500280}}
Because of what Tally (probably) does internally, this is not the most algorithmically efficient version, but it's the fastest I could come up with in Mathematica. For few bins it's faster than what I managed with Compile.

Anyway, just the fact that the compiled version is so much faster than BinCounts or HistogramList shows that BinCounts could (should?) be optimized more than it is currently, at least for packed arrays and equal width bins.  As a user I wouldn't want to rewrite these types of functions for speed.
Interesting that Floor is a lot faster if it is rounding down to an integer
In[8]:= Sort[Tally@Floor[d, 0.1]] // AbsoluteTiming

Out[8]= {0.954055, {{0., 500731}, {0.1, 499693}, {0.2, 499671}, {0.3,
   499778}, {0.4, 500568}, {0.5, 500277}, {0.6, 498844}, {0.7,
   498883}, {0.8, 501158}, {0.9, 500397}}}
In[17]:= Sort[Tally@Floor[10 d]] // AbsoluteTiming

Out[17]= {0.147008, {{0, 500731}, {1, 499693}, {2, 499671}, {3,
   499778}, {4, 500568}, {5, 500277}, {6, 498844}, {7, 498883}, {8,
   501158}, {9, 500397}}}
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract