I have access to a SLURM based cluster where I allocate 2 nodes, each with 12 processors. I am looking for a way to launch all 24 kernels to utilize the resources to the maximum for parallel computation.
nodes = ReadList["!scontrol show hostname $SLURM_JOB_NODELIST",String];
cores = ConstantArray[ToExpression@Environment["SLURM_CPUS_PER_TASK"], Length@nodes];
dir = Directory[];
Block[{$ContextPath}, Needs["SubKernels`RemoteKernels`"]];
kernelLaunch = "wolfram -wstp -linkmode connect `4` -linkname `2` -subkernel -noinit >&/dev/null &";
HPCKernel[nodes_,cores_]:= MapThread[
LaunchKernels[RemoteMachine[#1,"ssh -x -f "<>#1<>" \""<>kernelLaunch<>"\"",#2]]&,
{nodes,cores}];
res = HPCKernel[nodes,cores];
corecount = $KernelCount;
Save["results.wl", {nodes,cores,dir,res,corecount}]
In the results.wl
I get the following outputs:
nodes = {"wm8", "wm9"}
cores = {12, 12}
dir = "/work/ah1"
res = {$Failed, $Failed}
corecount = 0
We can see that res and hence HPCKernel has failed to launch any kernel. In fact the corecount remains 0. I have success in allocating appropriate resources, however I have no idea how to launch all available kernels. Where am I going wrong with this? Any help will be highly appreciated.
I have asked a similar question on StackExchange: https://mathematica.stackexchange.com/questions/213046/right-syntax-to-launch-multiple-kernels-on-slurm-cluster