Message Boards Message Boards

GROUPS:

Most efficient way to count occurences in a list?

Posted 1 month ago
337 Views
|
7 Replies
|
6 Total Likes
|

What would be the most efficient way to count occurences in a list L of elements that return True for a given boolean function f? Something like

Length[Select[L,f]] 

works but I'm not sure if it is efficient. The function Count does the trick for patterns but I can't see a way to use it for a function.

7 Replies

May be this helps

L = Table[RandomInteger[{0, 9}], {i, 50}]
Count[L, x_ /; x > 5]

or ( with the L from above )

bool[x_] := x > 5
Count[bool /@ L, x_ /; x == True]

Thanks! That's a nice way to transform it to a pattern. I did a quick test with the Timing function and was a bit surprised to see that the one with Select was a bit quicker:

In[1]:= 

bool[x_] := x > 5;
n = 10^6;
L = Table[RandomInteger[{0, 9}], {i, n}];
Timing[Length[Select[L, bool]]]
Timing[Count[bool /@ L, x_ /; x == True]]

Out[1]= {1.03125, 400526}

Out[2]= {1.57813, 400526}

But that is probably because the length function will have a shorter list to work with?

Posted 1 month ago

Fold seems to be quite a bit faster than Select or Count

In[1]:= n = 10^6;
L = Table[RandomInteger[{0, 9}], {i, n}];
Timing[Length[Select[L, # > 5 &]]]
Timing[Count[L, x_ /; x > 5]]
Timing[Fold[#1 + Boole[#2 > 5] &, 0, L]]

Out[3]= {0.374402, 399567}

Out[4]= {0.343202, 399567}

Out[5]= {0.0312002, 399567}

Yes, that was really a significant difference. What is it that makes Fold so much faster than Count? I guess that both sweeps though the list once but for some reson Fold is much faster with adding and evaluating.

Posted 1 month ago

This variant is even faster:

Total[Boole[# > 5] & /@ L]

This can be done ~5 times faster by using UnitStep:

L=RandomInteger[{0,9},1000000];

RepeatedTiming[Count[L,x_?(#>5&)]]

bool[x_]:=x>5
RepeatedTiming[Count[bool/@L,True]]

RepeatedTiming[Count[L,x_/;x>5]]

RepeatedTiming[Count[Thread[L>5],True]]

RepeatedTiming[Count[UnitStep[5-L],0]]

RepeatedTiming[Fold[#1+Boole[#2>5]&,0,L]]

RepeatedTiming[Total[Boole[#>5]&/@L]]

RepeatedTiming[Total[1-UnitStep[5-L]]]

RepeatedTiming[Length[L] - Total[UnitStep[5 - L]]]

gives me:

{0.474,400314}
{0.4834,400314}
{0.314,400314}
{0.26,400314}
{0.038,400314}
{0.0301,400314}
{0.022,400314}
{0.0067,400314}
{0.0045, 400314}

For short, concise and general, try CountsBy.

If your data is numeric, and the condition is also numeric, try my BoolEval package. http://szhorvat.net/mathematica/BoolEval

In[199]:= arr = RandomInteger[1000, 10000000];

In[201]:= CountsBy[arr, # > 100 &] // Timing
Out[201]= {5.6509, <|True -> 8991315, False -> 1008685|>}

In[202]:= << BoolEval`

In[203]:= BoolCount[arr > 100] // Timing
Out[203]= {0.356869, 8991315}

To count the ones whose square is less than 100, one would use BoolCount[arr^2 < 100]

BoolEval provides a readable and maintainable syntax to the same type of computation that Sander showed.

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract