How can define a Slurm Node only to be used on explicit demand?

To reserve the node for jobs that e.g. require GPU, you can create a separate partition for GPU jobs only and configure the node to belong to that partition. Here's an example configuration for your case:

# Define compute node
NodeName=rtx Gres=gpu:1 CPUs=16 Sockets=1 CoresPerSocket=8 ThreadsPerCore=2 CPUSpecList=0,1 State=UNKNOWN Weight=80
# Define GPU partition
PartitionName=gpu Nodes=rtx State=UP

After adding this configuration to your slurm.conf file and restarting the slurmctld and slurmd services, you can submit jobs to the gpu partition using the --partition option, like this:

srun --partition=gpu echo hi

This will reserve the rtx node for jobs in the gpu partition, and prevent non-GPU jobs from running on that node.

You also will need to add DEFAULT partition for the other jobs:

PartitionName=general Nodes=def Default=YES MaxTime=INFINITE State=UP

All other jobs should run on this default partition then. The GPU partition will only be used if explicitly required, even if the default partition has no available nodes as the following example shows:

gpu          up   infinite      1    idle rtx
general*     up   infinite      1    unk def
> sinfo
gpu          up   infinite      1   idle rtx
general*     up   infinite      1   unk* def
> srun --partition=gpu echo hi
> srun  echo hi
srun: Required node not available (down, drained or reserved)
srun: job 11 queued and waiting for resources
Asked: 2023-04-05 12:01:24 +0000

Seen: 21 times

Last updated: Apr 08 '23