# Wondering why I am not getting Parallel speed up on Cases[] and similar..

Posted 11 years ago
7435 Views
|
2 Replies
|
7 Total Likes
|
 I have some large lists that I need to prune. I expected that this should parallize quite well, but it doesn't. Is there someone out there who can explain why?Example:In[21]:= \$KernelCountOut[21]= 8rn = RandomInteger[{-1000, 1000}, 5^10];Timing[Parallelize[Select[rn, Positive]]][[1]]Timing[Cases[rn, a_ /; (a > 0 )]][[1]]Timing[Parallelize[Cases[rn, a_ /; (a > 0 )]]][[1]]Timing[ParallelCombine[(Cases[#, a_ /; a > 0] &), rn, List];][[1]]Out[28]= 4.728799Out[29]= 5.137632Out[30]= 5.653225Out[31]= 5.012263I would have guessed that ParallelDeleteCases, ParallelPosition, ParallelSelect, etc would have been natural externsions of ParallelMap...
2 Replies
Sort By:
Posted 11 years ago
 The problem with Parallelize and friends is that they suggest parallelization is easy. Just surround your code by Parallelize and that's it. It not! If it were as simple as that, then this would be built in per default. I will not go into details here but let me say a few words to your problem:The underlying task of your problem is very simple, it's just to look whether a number is greater zero. This is problematic, because then the overhead of the parallel calculation is likely to be greater then the benefit you are gaining. Let's make an example. I create a list like you but I use a length which is divisible by 8 so we can manually distribute 8 smaller lists to the parallel subkernelsLaunchKernels[];rn = RandomInteger[{-1000, 1000}, 8^7];First let's see what we get without parallelizationAbsoluteTiming[Select[rn, Positive]] // First (* 0.375135 *)Now that we know this, we can "estimate" what we expect if we divide the work by 8 and run it in parallel. For this, I partition the list into 8 sublists and use ParallelMap. Afterwards, I flatten the resulting sublists into one big resultFlatten[ParallelMap[Select[#, Positive] &, Partition[rn, 8^6]]] // AbsoluteTiming//First (* 0.432450 *)Looks like we lost some time and indeed there are several points which we haven't considered:we have to split the input into smaller tasks for the parallel subkernelswe have to transfer the input for each subkernel to its memorywe have to transfer the result back to the main kernelwe have to build up the final result from all the smaller partsAnd now you see instantly the basics of what you have to consider. Those points made the parallel approach obviously slower than a simple non-parallel call. Let's consider a second theoretical example: you want to find the roots of a complicated function numerically. The function has parameters and you want to find the roots for different, say 8 parameter sets. Let's assume the root-finding is very slow, say .5 seconds and now reconsider the situation:The splitting of the input is very cheap, because you are not splitting up a large list, you only have 8 parameter sets. Transfering a parameter set (lets say 5 parameters) to a subkernel costs you nothing compared to transfering a list with 8^6 entriesThe result is the root of the function, one number which is again very fast if it is transfered back to the main kernelBuilding up the result list of 8 roots is again not even worth mentioning.What's left is, that the .5 seconds of calculation are done in parallel and your overall calculation time drops from 0.5*8= 4 seconds to basically 0.5 seconds.For your current problem, I would consider a different kind of parallelization which you might not be aware of because this is already built in. The thing I mean is that many functions are Listable which means you give a list as input and they act on each element. Positive is one of those functions. The good thing is, that when you supply a list of numerical values to Positive it's calculated in parallel. The only thing you have to do is to reorder your calculation. You compute first which numbers in your list are positive and Pick then the matching out. AbsoluteTiming[Pick[rn, Positive[rn]]] // First (* 0.117210 *)
Posted 10 years ago
 Further notation, this build-in parallel or multi-threading technology does not take the license limit in general. Say you have 8 threads but only 4 kernels allowed via LaunchKernels[], you can still use the 8 threads when you use this type of function.
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.