Message Boards Message Boards

Wondering why I am not getting Parallel speed up on Cases[] and similar..

Posted 11 years ago
I have some large lists that I need to prune. I expected that this should parallize quite well, but it doesn't. Is there someone out there who can explain why?
Example:
In[21]:= $KernelCount
Out[21]= 8

rn = RandomInteger[{-1000, 1000}, 5^10];
Timing[Parallelize[Select[rn, Positive]]][[1]]
Timing[Cases[rn, a_ /; (a > 0 )]][[1]]
Timing[Parallelize[Cases[rn, a_ /; (a > 0 )]]][[1]]
Timing[ParallelCombine[(Cases[#, a_ /; a > 0] &), rn, List];][[1]]
Out[28]= 4.728799
Out[29]= 5.137632
Out[30]= 5.653225
Out[31]= 5.012263

I would have guessed that ParallelDeleteCases, ParallelPosition, ParallelSelect, etc would have been natural externsions of ParallelMap...
POSTED BY: W. Craig Carter
2 Replies
The problem with Parallelize and friends is that they suggest parallelization is easy. Just surround your code by Parallelize and that's it. It not! If it were as simple as that, then this would be built in per default. I will not go into details here but let me say a few words to your problem:

The underlying task of your problem is very simple, it's just to look whether a number is greater zero. This is problematic, because then the overhead of the parallel calculation is likely to be greater then the benefit you are gaining. Let's make an example. I create a list like you but I use a length which is divisible by 8 so we can manually distribute 8 smaller lists to the parallel subkernels

LaunchKernels[];
rn = RandomInteger[{-1000, 1000}, 8^7];

First let's see what we get without parallelization

AbsoluteTiming[Select[rn, Positive]] // First (* 0.375135 *)

Now that we know this, we can "estimate" what we expect if we divide the work by 8 and run it in parallel. For this, I partition the list into 8 sublists and use ParallelMap. Afterwards, I flatten the resulting sublists into one big result

Flatten[ParallelMap[Select[#, Positive] &, Partition[rn, 8^6]]] // AbsoluteTiming//First (* 0.432450 *)

Looks like we lost some time and indeed there are several points which we haven't considered:
  • we have to split the input into smaller tasks for the parallel subkernels
  • we have to transfer the input for each subkernel to its memory
  • we have to transfer the result back to the main kernel
  • we have to build up the final result from all the smaller parts
And now you see instantly the basics of what you have to consider. Those points made the parallel approach obviously slower than a simple non-parallel call. Let's consider a second theoretical example: you want to find the roots of a complicated function numerically. The function has parameters and you want to find the roots for different, say 8 parameter sets. Let's assume the root-finding is very slow, say .5 seconds and now reconsider the situation:
  • The splitting of the input is very cheap, because you are not splitting up a large list, you only have 8 parameter sets. 
  • Transfering a parameter set (lets say 5 parameters) to a subkernel costs you nothing compared to transfering a list with 8^6 entries
  • The result is the root of the function, one number which is again very fast if it is transfered back to the main kernel
  • Building up the result list of 8 roots is again not even worth mentioning.
What's left is, that the .5 seconds of calculation are done in parallel and your overall calculation time drops from 0.5*8= 4 seconds to basically 0.5 seconds.

For your current problem, I would consider a different kind of parallelization which you might not be aware of because this is already built in. The thing I mean is that many functions are Listable which means you give a list as input and they act on each element. Positive is one of those functions. The good thing is, that when you supply a list of numerical values to Positive it's calculated in parallel. The only thing you have to do is to reorder your calculation. You compute first which numbers in your list are positive and Pick then the matching out.
 AbsoluteTiming[Pick[rn, Positive[rn]]] // First (* 0.117210 *)
POSTED BY: Patrick Scheibe
Further notation, this build-in parallel or multi-threading technology does not take the license limit in general. Say you have 8 threads but only 4 kernels allowed via LaunchKernels[], you can still use the 8 threads when you use this type of function. 
POSTED BY: Shenghui Yang
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract