Message Boards Message Boards

0
|
4839 Views
|
7 Replies
|
7 Total Likes
View groups...
Share
Share this post:

Most efficient way to count occurences in a list?

Posted 5 years ago

What would be the most efficient way to count occurences in a list L of elements that return True for a given boolean function f? Something like

Length[Select[L,f]] 

works but I'm not sure if it is efficient. The function Count does the trick for patterns but I can't see a way to use it for a function.

POSTED BY: Robert Johansson
7 Replies

For short, concise and general, try CountsBy.

If your data is numeric, and the condition is also numeric, try my BoolEval package. http://szhorvat.net/mathematica/BoolEval

In[199]:= arr = RandomInteger[1000, 10000000];

In[201]:= CountsBy[arr, # > 100 &] // Timing
Out[201]= {5.6509, <|True -> 8991315, False -> 1008685|>}

In[202]:= << BoolEval`

In[203]:= BoolCount[arr > 100] // Timing
Out[203]= {0.356869, 8991315}

To count the ones whose square is less than 100, one would use BoolCount[arr^2 < 100]

BoolEval provides a readable and maintainable syntax to the same type of computation that Sander showed.

POSTED BY: Szabolcs Horvát

This can be done ~5 times faster by using UnitStep:

L=RandomInteger[{0,9},1000000];

RepeatedTiming[Count[L,x_?(#>5&)]]

bool[x_]:=x>5
RepeatedTiming[Count[bool/@L,True]]

RepeatedTiming[Count[L,x_/;x>5]]

RepeatedTiming[Count[Thread[L>5],True]]

RepeatedTiming[Count[UnitStep[5-L],0]]

RepeatedTiming[Fold[#1+Boole[#2>5]&,0,L]]

RepeatedTiming[Total[Boole[#>5]&/@L]]

RepeatedTiming[Total[1-UnitStep[5-L]]]

RepeatedTiming[Length[L] - Total[UnitStep[5 - L]]]

gives me:

{0.474,400314}
{0.4834,400314}
{0.314,400314}
{0.26,400314}
{0.038,400314}
{0.0301,400314}
{0.022,400314}
{0.0067,400314}
{0.0045, 400314}
POSTED BY: Sander Huisman
Posted 5 years ago

This variant is even faster:

Total[Boole[# > 5] & /@ L]
POSTED BY: Hans Milton

Yes, that was really a significant difference. What is it that makes Fold so much faster than Count? I guess that both sweeps though the list once but for some reson Fold is much faster with adding and evaluating.

POSTED BY: Robert Johansson
Posted 5 years ago

Fold seems to be quite a bit faster than Select or Count

In[1]:= n = 10^6;
L = Table[RandomInteger[{0, 9}], {i, n}];
Timing[Length[Select[L, # > 5 &]]]
Timing[Count[L, x_ /; x > 5]]
Timing[Fold[#1 + Boole[#2 > 5] &, 0, L]]

Out[3]= {0.374402, 399567}

Out[4]= {0.343202, 399567}

Out[5]= {0.0312002, 399567}
POSTED BY: Hans Milton

Thanks! That's a nice way to transform it to a pattern. I did a quick test with the Timing function and was a bit surprised to see that the one with Select was a bit quicker:

In[1]:= 

bool[x_] := x > 5;
n = 10^6;
L = Table[RandomInteger[{0, 9}], {i, n}];
Timing[Length[Select[L, bool]]]
Timing[Count[bool /@ L, x_ /; x == True]]

Out[1]= {1.03125, 400526}

Out[2]= {1.57813, 400526}

But that is probably because the length function will have a shorter list to work with?

POSTED BY: Robert Johansson

May be this helps

L = Table[RandomInteger[{0, 9}], {i, 50}]
Count[L, x_ /; x > 5]

or ( with the L from above )

bool[x_] := x > 5
Count[bool /@ L, x_ /; x == True]
POSTED BY: Hans Dolhaine
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract