Message Boards Message Boards

Right syntax to launch multiple kernels on a SLURM based cluster

I have access to a SLURM based cluster where I allocate 2 nodes, each with 12 processors. I am looking for a way to launch all 24 kernels to utilize the resources to the maximum for parallel computation.

nodes = ReadList["!scontrol show hostname $SLURM_JOB_NODELIST",String];
cores = ConstantArray[ToExpression@Environment["SLURM_CPUS_PER_TASK"], Length@nodes];
dir = Directory[];

Block[{$ContextPath}, Needs["SubKernels`RemoteKernels`"]];

kernelLaunch = "wolfram -wstp -linkmode connect `4` -linkname `2` -subkernel -noinit >&/dev/null &";

HPCKernel[nodes_,cores_]:= MapThread[
LaunchKernels[RemoteMachine[#1,"ssh -x -f "<>#1<>" \""<>kernelLaunch<>"\"",#2]]&,
{nodes,cores}];

res = HPCKernel[nodes,cores];
corecount = $KernelCount;

Save["results.wl", {nodes,cores,dir,res,corecount}]

In the results.wl I get the following outputs:

nodes = {"wm8", "wm9"}

cores = {12, 12}

dir = "/work/ah1"

res = {$Failed, $Failed}

corecount = 0

We can see that res and hence HPCKernel has failed to launch any kernel. In fact the corecount remains 0. I have success in allocating appropriate resources, however I have no idea how to launch all available kernels. Where am I going wrong with this? Any help will be highly appreciated.

I have asked a similar question on StackExchange: https://mathematica.stackexchange.com/questions/213046/right-syntax-to-launch-multiple-kernels-on-slurm-cluster

POSTED BY: Ali Hashmi
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract