Message Boards Message Boards

Parallelize Tally[]?

Posted 7 years ago

Is there a way to parallelize a Tally of a large list? For example:

list = ParallelTable[{RandomChoice[Range[100]], RandomChoice[Range[100]]}, 100000000];
Tally[list]

The Tally takes ~92 seconds.

enter image description here

does not work. It seems like there must be a way, since different parts of the list can be tallied in parallel, then combined/re-tallied. We can manually split up the list and do partial Tallies with ParallelDo, then manually combine the result, but I prefer to find a way that automatically parallelizes and combines the partial Tallies.

POSTED BY: Bryan Lettner
2 Replies

Though it may sound very complicated of combining the different Tally-results and to make your own ParallelTally, it is actually surprisingly easy:

$HistoryLength=3;
ClearAll[ParallelTally]
ParallelTally[list_List]:=Block[{len=Ceiling[Length[list]/$ProcessorCount],tmp},
    tmp=Partition[list,UpTo[len]];
    tmp=ParallelMap[Tally,tmp,Method->"CoarsestGrained"];
    KeyValueMap[List,Merge[Association/@Apply[Rule,tmp,{2}],Total]]
]

However I have a slight suspicion there is already parallelism built in to Tally, as the timings are worse!

list = RandomInteger[{1, 100}, {2000003, 2}];
AbsoluteTiming[a = ParallelTally[list];]
AbsoluteTiming[b = Tally[list];]
Sort[a] === Sort[b]

Don't forget that merging the results is a non-trivial task, which requires sorting internally (or another technique which scales (at best) n log(n) I guess...), which scales non-linearly (super linear). To speed up your code I would advice you to have a single index, rather than a double-index, as a single index can be tallied much faster:

list = RandomInteger[{1, 10}, {10^6, 2}];
list2 = RandomInteger[{1, 100}, 2 10^6];
ByteCount[list]
ByteCount[list2]
AbsoluteTiming[Length[Tally[list]]]
AbsoluteTiming[Length[Tally[list2]]]

Both lists have same memory usage, and will result in the same amount of bins. But a single-index tally is much faster (150x) in my case!

POSTED BY: Sander Huisman
Posted 7 years ago

Thanks Sander. Interesting method. You're correct, indeed slower. I will play around with it.

POSTED BY: Bryan Lettner
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract