Skip to content

Backups

Slurm

Prerequisites

  • Make sure the oci cli tool is installed and properly configured. The backup and restore scripts rely on the oci cli tool.
  • Make sure that a bucket named 'backups' exists in OCI.

Backing up Slurm

The following script dumps the slurm accounting database to a file, compresses it, and uploads it to the 'backups' bucket.

python /opt/oci-hpc/scripts/slurm_backup.py

Automatic Slurm Backups

To set up a daily backup at midnight, add the following entry to the crontab:

0 0 * * * /opt/oci-hpc/scripts/slurm_backup.py
Note: Ensure the script runs successfully by checking logs or the backup bucket regularly. The script logs to /opt/oci-hpc/logs/backups/.

Clearing Slurm Data

Note: Before restoring Slurm data, you will need to drop the existing database.

sudo systemctl stop slurmdbd.service
mysql --defaults-extra-file=/home/ubuntu/.slurm.cnf -e "DROP DATABASE slurm_accounting;"
mysql --defaults-extra-file=/home/ubuntu/.slurm.cnf -e "CREATE DATABASE slurm_accounting;"
sudo systemctl restart slurmdbd.service slurmctld.service

Restoring Slurm From a Backup

Note: Ensure Slurm Accounting database has been cleared (see Clearing Slurm Data section)

# 1. Located the desired backups in the backups bucket in OCI. Look for a file in the format
/slurm/slurm_backup_{year}_{month}_{day}.json.gz
/slurm/slurm_{year}_{month}_{day}.conf.gz
# 2. Download the backup files
oci os object get --bucket-name backups --name /slurm/slurm_backup_{year}_{month}_{day}.sql.gz --file /tmp/slurm_backup.sql.gz
oci os object get --bucket-name backups --name /slurm/slurm_{year}_{month}_{day}.conf.gz --file /tmp/slurm.conf.gz
# 3. Decompress the files
gunzip /tmp/slurm_backup.sql.gz
gunzip /tmp/slurm.conf.gz
# 4. Restore Slurm data
sudo systemctl stop slurmdbd.service
mysql --defaults-extra-file=/home/ubuntu/.slurm.cnf slurm_accounting < /tmp/slurm_backup.sql
sudo mv /etc/slurm/slurm.conf /etc/slurm/slurm.conf.bak
sudo mv /tmp/slurm.conf /etc/slurm/slurm.conf
# 5. Restart services and reconfigure Slurm
sudo systemctl restart slurmdbd.service slurmctld.service
sudo scontrol reconfigure