Skip to content

Slurm Commands

SlurmsUp!!

Basic Slurm commands to view configurations:

scontrol show partitions

sacctmgr show associations

scontrol update PartitionName=compute MaxTime=36:00:00

If you update slurm.conf:

scontrol reconfigure

If a node is down:

sudo scontrol update nodename=compute-833 state=drain reason=PCI

sudo systemctl restart slurmctld

Specifying user should belong to a partition

Create a partition by editing /etc/slurm/parts similar to this:

PartitionName=MONTHLY1 AllowAccounts=foo Nodes=compute-0-0

Create an Account and add the user to that

sacctmgr add account foo Cluster=jupyter

sacctmgr add user test2 Account=foo Partitions=MONTHLY1

Reporting user usage

sacct -S2020-01-01 -E2020-10-07 -X -oJobID,Elapsed,ReqMem,ReqCPUS,User,Node --parsable

sacct -a -X --format=JobID,User,State,AllocTRES --state=RUNNING,SUSPENDED --starttime=2023-05-01T00:00:00 --endtime=2023-05-01T00:00:30 --parsable2 --noheader

Setting fairshare to nonstandard

sacctmgr modify user where name=someuser account=researchgroup set fairshare=N

For users with limited access they'll be put under a different partition

sudo sacctmgr add partition gres PartitionName=gres MaxTime=48:00:00 GrpTRESMins=gres/gpu=100

In slurm.conf Priority/Fairshare section

PriorityType=priority/multifactor
2 week half-life

PriorityDecayHalfLife=14-0
The larger the job, the greater its job size priority.

PriorityFavorSmall=no
FairShareDampeningFactor=2
The job's age factor reaches 1.0 after waiting in the
queue for 2 weeks.

PriorityMaxAge=14-0
This next group determines the weighting of each of the
components of the Multi-factor Job Priority Plugin.
The default value for each of the following is 1.

PriorityWeightAge=1000
PriorityWeightFairshare=10000
PriorityWeightJobSize=1000

PartitionName=blah Nodes=dev[0-8,18-25] MaxTime=36:00:00 DefaultTime=02:00:00 Default=YES