Group Abstract

Message Boards

WOLFRAM COMMUNITY

6.4K Views

7 Replies

7 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Wolfram Language

Most efficient way to count occurences in a list?

Robert Johansson

Posted 6 years ago

What would be the most efficient way to count occurences in a list L of elements that return True for a given boolean function f? Something like Length[Select[L,f]] works but I'm not sure if it is efficient. The function Count does the trick for patterns but I can't see a way to use it for a function.

POSTED BY: Robert Johansson

7 Replies

Sort By:

Szabolcs Horvát

Posted 6 years ago

For short, concise and general, try `CountsBy`. If your data is numeric, and the condition is also numeric, try my `BoolEval` package. http://szhorvat.net/mathematica/BoolEval In[199]:= arr = RandomInteger[1000, 10000000]; In[201]:= CountsBy[arr, # > 100 &] // Timing Out[201]= {5.6509, <\|True -> 8991315, False -> 1008685\|>} In[202]:= << BoolEval` In[203]:= BoolCount[arr > 100] // Timing Out[203]= {0.356869, 8991315} To count the ones whose square is less than 100, one would use `BoolCount[arr^2 < 100]` BoolEval provides a readable and maintainable syntax to the same type of computation that Sander showed.

POSTED BY: Szabolcs Horvát

Sander Huisman

Sander Huisman, University of Twente

Posted 6 years ago

This can be done ~5 times faster by using `UnitStep`: L=RandomInteger[{0,9},1000000]; RepeatedTiming[Count[L,x_?(#>5&)]] bool[x_]:=x>5 RepeatedTiming[Count[bool/@L,True]] RepeatedTiming[Count[L,x_/;x>5]] RepeatedTiming[Count[Thread[L>5],True]] RepeatedTiming[Count[UnitStep[5-L],0]] RepeatedTiming[Fold[#1+Boole[#2>5]&,0,L]] RepeatedTiming[Total[Boole[#>5]&/@L]] RepeatedTiming[Total[1-UnitStep[5-L]]] RepeatedTiming[Length[L] - Total[UnitStep[5 - L]]] gives me: {0.474,400314} {0.4834,400314} {0.314,400314} {0.26,400314} {0.038,400314} {0.0301,400314} {0.022,400314} {0.0067,400314} {0.0045, 400314}

This can be done ~5 times faster by using UnitStep:

L=RandomInteger[{0,9},1000000];

RepeatedTiming[Count[L,x_?(#>5&)]]

bool[x_]:=x>5
RepeatedTiming[Count[bool/@L,True]]

RepeatedTiming[Count[L,x_/;x>5]]

RepeatedTiming[Count[Thread[L>5],True]]

RepeatedTiming[Count[UnitStep[5-L],0]]

RepeatedTiming[Fold[#1+Boole[#2>5]&,0,L]]

RepeatedTiming[Total[Boole[#>5]&/@L]]

RepeatedTiming[Total[1-UnitStep[5-L]]]

RepeatedTiming[Length[L] - Total[UnitStep[5 - L]]]

gives me:

{0.474,400314}
{0.4834,400314}
{0.314,400314}
{0.26,400314}
{0.038,400314}
{0.0301,400314}
{0.022,400314}
{0.0067,400314}
{0.0045, 400314}

POSTED BY: Sander Huisman

Hans Milton

Posted 6 years ago

This variant is even faster: Total[Boole[# > 5] & /@ L]

POSTED BY: Hans Milton

Hans Milton

Posted 6 years ago

`Fold` seems to be quite a bit faster than `Select` or `Count` In[1]:= n = 10^6; L = Table[RandomInteger[{0, 9}], {i, n}]; Timing[Length[Select[L, # > 5 &]]] Timing[Count[L, x_ /; x > 5]] Timing[Fold[#1 + Boole[#2 > 5] &, 0, L]] Out[3]= {0.374402, 399567} Out[4]= {0.343202, 399567} Out[5]= {0.0312002, 399567}

Fold seems to be quite a bit faster than Select or Count

In[1]:= n = 10^6;
L = Table[RandomInteger[{0, 9}], {i, n}];
Timing[Length[Select[L, # > 5 &]]]
Timing[Count[L, x_ /; x > 5]]
Timing[Fold[#1 + Boole[#2 > 5] &, 0, L]]

Out[3]= {0.374402, 399567}

Out[4]= {0.343202, 399567}

Out[5]= {0.0312002, 399567}

POSTED BY: Hans Milton

Robert Johansson

Posted 6 years ago

Yes, that was really a significant difference. What is it that makes Fold so much faster than Count? I guess that both sweeps though the list once but for some reson Fold is much faster with adding and evaluating.

POSTED BY: Robert Johansson

Robert Johansson

Posted 6 years ago

Thanks! That's a nice way to transform it to a pattern. I did a quick test with the Timing function and was a bit surprised to see that the one with Select was a bit quicker: In[1]:= bool[x_] := x > 5; n = 10^6; L = Table[RandomInteger[{0, 9}], {i, n}]; Timing[Length[Select[L, bool]]] Timing[Count[bool /@ L, x_ /; x == True]] Out[1]= {1.03125, 400526} Out[2]= {1.57813, 400526} But that is probably because the length function will have a shorter list to work with?

POSTED BY: Robert Johansson

Hans Dolhaine

Hans Dolhaine, retired

Posted 6 years ago

May be this helps L = Table[RandomInteger[{0, 9}], {i, 50}] Count[L, x_ /; x > 5] or ( with the L from above ) bool[x_] := x > 5 Count[bool /@ L, x_ /; x == True]

POSTED BY: Hans Dolhaine

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback