Most efficient way to count occurences in a list?

Posted 1 month ago
337 Views
|
7 Replies
|
6 Total Likes
|
 What would be the most efficient way to count occurences in a list L of elements that return True for a given boolean function f? Something like Length[Select[L,f]] works but I'm not sure if it is efficient. The function Count does the trick for patterns but I can't see a way to use it for a function. Answer
7 Replies
Sort By:
Posted 1 month ago
 May be this helps L = Table[RandomInteger[{0, 9}], {i, 50}] Count[L, x_ /; x > 5] or ( with the L from above ) bool[x_] := x > 5 Count[bool /@ L, x_ /; x == True] Answer
Posted 1 month ago
 Thanks! That's a nice way to transform it to a pattern. I did a quick test with the Timing function and was a bit surprised to see that the one with Select was a bit quicker: In:= bool[x_] := x > 5; n = 10^6; L = Table[RandomInteger[{0, 9}], {i, n}]; Timing[Length[Select[L, bool]]] Timing[Count[bool /@ L, x_ /; x == True]] Out= {1.03125, 400526} Out= {1.57813, 400526} But that is probably because the length function will have a shorter list to work with? Answer
Posted 1 month ago
 Fold seems to be quite a bit faster than Select or Count In:= n = 10^6; L = Table[RandomInteger[{0, 9}], {i, n}]; Timing[Length[Select[L, # > 5 &]]] Timing[Count[L, x_ /; x > 5]] Timing[Fold[#1 + Boole[#2 > 5] &, 0, L]] Out= {0.374402, 399567} Out= {0.343202, 399567} Out= {0.0312002, 399567} Answer
Posted 1 month ago
 Yes, that was really a significant difference. What is it that makes Fold so much faster than Count? I guess that both sweeps though the list once but for some reson Fold is much faster with adding and evaluating. Answer
Posted 1 month ago
 This variant is even faster: Total[Boole[# > 5] & /@ L] Answer
Posted 1 month ago
 This can be done ~5 times faster by using UnitStep: L=RandomInteger[{0,9},1000000]; RepeatedTiming[Count[L,x_?(#>5&)]] bool[x_]:=x>5 RepeatedTiming[Count[bool/@L,True]] RepeatedTiming[Count[L,x_/;x>5]] RepeatedTiming[Count[Thread[L>5],True]] RepeatedTiming[Count[UnitStep[5-L],0]] RepeatedTiming[Fold[#1+Boole[#2>5]&,0,L]] RepeatedTiming[Total[Boole[#>5]&/@L]] RepeatedTiming[Total[1-UnitStep[5-L]]] RepeatedTiming[Length[L] - Total[UnitStep[5 - L]]] gives me: {0.474,400314} {0.4834,400314} {0.314,400314} {0.26,400314} {0.038,400314} {0.0301,400314} {0.022,400314} {0.0067,400314} {0.0045, 400314} Answer
Posted 1 month ago
 For short, concise and general, try CountsBy.If your data is numeric, and the condition is also numeric, try my BoolEval package. http://szhorvat.net/mathematica/BoolEval In:= arr = RandomInteger[1000, 10000000]; In:= CountsBy[arr, # > 100 &] // Timing Out= {5.6509, <|True -> 8991315, False -> 1008685|>} In:= << BoolEval In:= BoolCount[arr > 100] // Timing Out= {0.356869, 8991315} To count the ones whose square is less than 100, one would use BoolCount[arr^2 < 100]`BoolEval provides a readable and maintainable syntax to the same type of computation that Sander showed. Answer