Kindly Help:
I access mathematica (through MobaTerm), which is installed in my user area on Torque managed HPC, to connect multiple nodes assigned to my Job submitted in Queue.
PROBLEM
Following is the program to connect 2 nodes, and each node consists of 40 cores.
Everything looks fine, expect the time taken by ParallelTable
at the end of program, which is much higher than time taken by single node (without launching remote kernels)
Note that allotted two nodes atulya095
and atulya049
appear 40 times each in the output.
PROGRAM
nodes=ReadList[Environment["PBS_NODEFILE"], "String"];
Print["alloted node are ", nodes]; (*get association of resources, name of local host and remove local host from available resources*)
hosts = Counts[nodes];
local = First[StringSplit[Environment["HOSTNAME"],"."]];
Print["local node is ", local];
hosts[local]--;
Needs["SubKernels`RemoteKernels`"];
Map[If[hosts[#] > 0, LaunchKernels[RemoteMachine[#, "ssh -x -f -l `3` `1` wolfram -wstp -linkmode Connect `4` -linkname '`2`' -subkernel -noinit", hosts[#]]]]&, Keys[hosts]];
Print["kernel count is ",$KernelCount];
Print[" machine name is ", ParallelEvaluate[$MachineName]];
Print[" kernel id is ",ParallelEvaluate[$KernelID]];
Print["processor count is ",$ProcessorCount];
Print[AbsoluteTiming[ParallelTable[Exp[Sin[x]]^Sin[x],{x,0.1,200000,0.0001}];][[1]]];
CloseKernels[];
Quit[];
OUTPUT
alloted node are {atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049}
local node is atulya095
kernel count is 79
machine name is {atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya095, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049, atulya049}
kernel id is {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79}
processor count is 40
189.309181
I will be thankful to your help here.