Message Boards Message Boards

Parallel Computing and Object Oriented Programming - Part 3

Posted 5 years ago

See,

The Wolfram Language includes Parallel Computing Tools, and parallel computing is the first step in high-performance computing. The combination of parallel computing tools and object-oriented programming (following OOP here) introduces a new perspective on Wolfram Language programming. OOP is well-suited for parallel computing because instances are essentially independent computational units and the same code can be applied in both parallel and mono-kernel environments.

So far, the author has introduced an OOP system for the Wolfram Language, and continues with Part 1 and 2 with examples of OOP-based parallel computing.

Part 3 is concerned to the calculations using large-sized instances such as 10^6 deployed on the multi local cores. This example is intended to parallel nearest points calculation for 10^6 points randomly dispersed in a 3d area. The example is based on the case of 4 local CPU cores.

Parallel code efficiency is depend on the CPU performance and multi-core performance, you should implement this sample code on your computer and then can evaluate this OOP methods. This sample has shown that the OOP parallel computing efficiency is scaling to the number of CPU cores, then the Wolfram OOP code for the newest CPUs will show an excellent performance, if the ParallelEvaluate[] function can handle greater number of cores which is limited in the present version of Wolfram language.

step.1 setup local kernels

LaunchKernels[];
kernelList = ParallelEvaluate[$KernelID];
nk = Length[kernelList];
kernelNameList = Table[Unique[k], {nk}];

step.2 definition of parallel calculation class, You can find nested class in the following code which defines the parallel calculation class. Each instance constructed form this class simply holding the position of itself given randomly at the time of construction.

kernel[nam_] := Module[{myKernelName = nam, prNameTable},

       (* define parallel calculation class *)

       para[pnam_] := Module[{pos = RandomReal[1, 3]},
         getpos[pnam] := pos];

       (* kernel method to make parallel calculation object list *)

       setprtable[n_] := prNameTable = Table[Unique[c], {n}];

       (* kernel method to construct parallel calculation instances using \
    object list *)
construct[] := Map[para[#] &, prNameTable];

       (* nearest calculation method *)
       near[x_, n_] := Nearest[Map[getpos[#] &, prNameTable], x, n]
       ];

step.3 definition of the number of parallel calculation instances on each core. In this case, total number of instances is 10^6= 4*250000.

nInstance = 250000;

step.4 definition of local kernel property

kernelObject = 
 Table[Association["name" -> kernelNameList[[i]], 
   "kernel" -> kernelList[[i]]], {i, nk}]

step.5 constructing local kernel instances with predefied association list,

AbsoluteTiming[
 Map[ParallelEvaluate[kernel[#name], #kernel] &, kernelObject]
 ]

step 6. accessing to the method of local kernel instances, preparing the name list of parallel computing instances in parallel

AbsoluteTiming[
 ParallelEvaluate[setprtable[nInstance]];]

step 7. accessing to the method of local kernel instances, to construct and deploy the parallel computing instances

That is, on each core, instances will be constructed in parallel.

AbsoluteTiming[
 ParallelEvaluate[construct[]];]

step 8. execution of parallel computing

AbsoluteTiming[
 ans = ParallelEvaluate[near[{0.5, 0.5, 0.5}, 10]]]

Results of computation

Following 3D graphics shows each nearest 10 points calculated on each core near zero-point {0.5,0.5,0.5}.

ListPointPlot3D[{ans[[1]], ans[[2]], ans[[3]], ans[[4]]}, 
 PlotStyle -> {Red, Blue, Greem, Black}, 
 PlotRange -> {{0.4, 0.6}, {0.4, 0.6}, {0.4, 0.6}}, 
 BoxRatios -> Automatic]

enter image description here

4 Replies

enter image description here -- you have earned Featured Contributor Badge enter image description here Your exceptional post has been selected for our editorial column Staff Picks http://wolfr.am/StaffPicks and Your Profile is now distinguished by a Featured Contributor Badge and is displayed on the Featured Contributor Board. Thank you!

POSTED BY: EDITORIAL BOARD
Posted 2 years ago

Hello, the combination of OOP and parallel programming seems very very appealing. It is well known that parallel programming is well suited for problems where you are able to minimize the dependencies (in terms of variables, functions and data) between the computing units running within the different kernels. That means to get the maximum possible separation of input/output data and functions so as to avoid as much as possible the use of "shared variables" and "distributed definitions" which introduce large overheads and slow down the efficiency of parallel processing.

Now, from the analysis of your examples of parallel processing with OOP encapsulation of data and functions I wonder whether you could exploit some way this technique to get an efficient management of parallel processing when there are "global" problem variables and functions which otherwise would need to be shared and distributed among the different kernels. I hope my doubts have been clearly exposed. Thank you very much for your efforts and for any kind reply. Andrea

POSTED BY: andrea andrea

Hello Andrea, Thank you for your information.

As you point out, there is a trade-off between parallelization and handling global variables. So how to shrink global variables is the first challenge for parallelization. However, there are cases where global variables cannot be eliminated. For example, in the LifeGame I contributed to Community a few years ago, the global variable of the entire board cannot be eliminated.

Title is, Applying Instance Indexed OOP to Multi-core Life Game Hirokazu Kobayashi, Free Posted 2 years ago.

Having said that, we cannot say that parallelization is meaningless. This is because you can get a reasonable computational speed. In this case, the approach is to take a thin band of regions adjacent to the edge of each parallelized region and copy it. In addition, I introduced a virtual addressing method for the address of each region and managed everything with a uniform address, so programming seems to have been considerably abstracted.

Enjoy, Mathematica.

Posted 2 years ago

Hi Hirokazu,
thanks for your reply. I'll study the examples you mention...particularly the ones explaining the method copying the adjacent regions to the edge of each parallelized region and the virtual addressing.

I will dare to annoy you should I need some more explanations.
Thank you very much again
Andrea

POSTED BY: andrea andrea
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract