Skip to content

Usage Limits

Setting Up GPU Usage Limits in Slurm with SlurmDBD

This guide explains how to enforce per-user and per-account GPU usage quotas in Slurm using SlurmDBD, ensuring automatic monthly resets.

Prerequisites

Ensure the following before proceeding:

  • Slurm Accounting Enabled: Configure SlurmDBD in slurm.conf (AccountingStorageType=accounting_storage/slurmdbd).
  • GPU GRES Configured: Add AccountingStorageTRES=gres/gpu in slurm.conf.

Setting Usage Limits

GPU usage is measured in GPU-minutes. For example, we set:

  • Per-user limit: \$5000/month at \$1.8 per GPU-hour → 166,667 GPU-minutes
  • Per-account limit: \$20,000/month at \$1.8 per GPU-hour → 666,667 GPU-minutes

Commands:

sudo sacctmgr modify user name=alice account=research set GrpTRESMins=gres/gpu=166667
sudo sacctmgr modify account name=research set GrpTRESMins=gres/gpu=666667

Verifying Limits

  • Check limits:
    sacctmgr show assoc where account=research format=Account,User,GrpTRESMins
    
  • Check current usage:
    sreport cluster AccountUtilizationByUser Accounts=research start=YYYY-MM-01 end=now -T gres/gpu -t hours
    
  • Verify enforcement:
    scontrol show config | grep AccountingStorageEnforce
    

Enforcing Limits

  1. Add to slurm.conf:
    AccountingStorageEnforce=limits,safe
    
  2. Restart Slurm:
    systemctl restart slurmctld
    

Testing Limits

  • Set a low test limit:
    sacctmgr modify user name=alice account=research set GrpTRESMins=gres/gpu=10
    
  • Submit jobs:
    sbatch -A research --gres=gpu:1 --time=5 test_job.sh
    sbatch -A research --gres=gpu:1 --time=10 long_job.sh  # Should be blocked
    
  • Revert test limits:
    sacctmgr modify user name=alice account=research set GrpTRESMins=gres/gpu=166667
    

Automating Monthly Resets

Enable automatic resets using Slurm's priority scheduler:

PriorityType=priority/multifactor
PriorityDecayHalfLife=0
PriorityUsageResetPeriod=MONTHLY

Verify reset at the start of a new month using:

sshare -l -a -o Account,User,GrpTRESRaw,GrpTRESMins,TRESRunMins

Additional Notes

  • Limits Hierarchy: User and account limits apply separately; the most restrictive prevails.
  • Adjusting Limits: Modify GrpTRESMins dynamically via sacctmgr.
  • Monitoring: Use sreport or database queries to track GPU usage.
  • Removing Limits:
    sacctmgr modify user name=alice account=research set GrpTRESMins=gres/gpu=-1
    sacctmgr modify account name=research set GrpTRESMins=gres/gpu=-1
    

With these settings, Slurm will enforce GPU-minute limits, reset them monthly, and ensure resource usage stays within budgeted allocations.