Slurm Commands¶

Basic Slurm commands to view configurations:
scontrol show partitions
sacctmgr show associations
scontrol update PartitionName=compute MaxTime=36:00:00
If you update slurm.conf:
scontrol reconfigure
If a node is down:
sudo scontrol update nodename=compute-833 state=drain reason=PCI
sudo systemctl restart slurmctld
Specifying user should belong to a partition
Create a partition by editing /etc/slurm/parts similar to this:
PartitionName=MONTHLY1 AllowAccounts=foo Nodes=compute-0-0
Create an Account and add the user to that
sacctmgr add account foo Cluster=jupyter
sacctmgr add user test2 Account=foo Partitions=MONTHLY1
Reporting user usage
sacct -S2020-01-01 -E2020-10-07 -X -oJobID,Elapsed,ReqMem,ReqCPUS,User,Node --parsable
sacct -a -X --format=JobID,User,State,AllocTRES --state=RUNNING,SUSPENDED --starttime=2023-05-01T00:00:00 --endtime=2023-05-01T00:00:30 --parsable2 --noheader
Setting fairshare to nonstandard
sacctmgr modify user where name=someuser account=researchgroup set fairshare=N
For users with limited access they'll be put under a different partition
sudo sacctmgr add partition gres PartitionName=gres MaxTime=48:00:00 GrpTRESMins=gres/gpu=100
In slurm.conf Priority/Fairshare section
PriorityType=priority/multifactor
2 week half-life
PriorityDecayHalfLife=14-0
The larger the job, the greater its job size priority.
PriorityFavorSmall=no
FairShareDampeningFactor=2
The job's age factor reaches 1.0 after waiting in the
queue for 2 weeks.
PriorityMaxAge=14-0
This next group determines the weighting of each of the
components of the Multi-factor Job Priority Plugin.
The default value for each of the following is 1.
PriorityWeightAge=1000
PriorityWeightFairshare=10000
PriorityWeightJobSize=1000
PartitionName=blah Nodes=dev[0-8,18-25] MaxTime=36:00:00 DefaultTime=02:00:00 Default=YES