Though it may sound very complicated of combining the different Tally-results and to make your own ParallelTally, it is actually surprisingly easy:
$HistoryLength=3;
ClearAll[ParallelTally]
ParallelTally[list_List]:=Block[{len=Ceiling[Length[list]/$ProcessorCount],tmp},
tmp=Partition[list,UpTo[len]];
tmp=ParallelMap[Tally,tmp,Method->"CoarsestGrained"];
KeyValueMap[List,Merge[Association/@Apply[Rule,tmp,{2}],Total]]
]
However I have a slight suspicion there is already parallelism built in to Tally, as the timings are worse!
list = RandomInteger[{1, 100}, {2000003, 2}];
AbsoluteTiming[a = ParallelTally[list];]
AbsoluteTiming[b = Tally[list];]
Sort[a] === Sort[b]
Don't forget that merging the results is a non-trivial task, which requires sorting internally (or another technique which scales (at best) n log(n) I guess...), which scales non-linearly (super linear).
To speed up your code I would advice you to have a single index, rather than a double-index, as a single index can be tallied much faster:
list = RandomInteger[{1, 10}, {10^6, 2}];
list2 = RandomInteger[{1, 100}, 2 10^6];
ByteCount[list]
ByteCount[list2]
AbsoluteTiming[Length[Tally[list]]]
AbsoluteTiming[Length[Tally[list2]]]
Both lists have same memory usage, and will result in the same amount of bins. But a single-index tally is much faster (150x) in my case!