How to onboard researchers and labs¶
We have two types of researchers that use our cluster. CAIS researchers and researchers that belong to other labs. A researchers lab information should be on the Onboarding Kanban. Look for a tile with the researcher name. If the information isn't present or you are unable to determine what lab a researcher belong to contact Oliver Zhang. Researchers who are ready to get onboarded will be under the Need to Add to Cluster section. The researchers public SSH key should be attached to the tile. Look for the SSH Key section in the tile.
Onboarding Script¶
We now have an onboarding script that automates most of the onboarding process. The script has been added to the /opt/oci-hpc/bin directory and has been named onboard.sh. To use the script start by modifying the script by providing values for the following variables,
...
# Variables (Replace these with actual values or pass them as arguments)
lab_name='' # john_smith
researcher_name='' # paul_atredies
full_name='' # Paul Atredies
user_password=''
parent_account='' # cais, Labs, grads, etc...
ssh_key=''
...
onboard.sh.
After you're done using the script don't forgot to clear the values you set.
Manual Steps¶
Lab onboarding¶
If the researcher belongs to a lab that already exists on the cluster move on to the next section.
For name_of_lab use the name of the labs PI.
# on the bastion node
# add the lab group:
cluster group create {name_of_lab}
# add slurm lab account:
sudo sacctmgr add account {name_of_lab} Parent=labs Description="Professor X Lab" Organization=Prof_X
Researcher onboarding¶
Create cluster user¶
# retrieve the lab group id
cluster group list | grep -wA1 {name_of_lab} | grep gidNumber
> gidNumber: {group_id}
cluster user add {researcher_name} --gid {group_id}
password = NQVjnPD6bYY8SNNade7aeFxTSKJZWVqR
Create Slurm user:¶
Add the user to the proper Slurm account:
Add SSH key¶
# become the user
sudo su {researcher_name}
# add SSH key to authorized_keys file
vim ~/.ssh/authorized_keys
Test¶
Make sure that the researcher can get jobs allocated:
# become the user
sudo su {researcher_name}
# run this slurm job
srun --gpus=2 --pty /bin/bash
# keep an eye out for errors
Filesystem Quotas¶
By default, we give each researcher 1TB of free space in their home directories. We manage our filesystem quotas using the builtin quota feature of Weka. You can set their quota with the following command:
If a researchers request for more space has been approved you can increase their quota with the same command with the updated quota amount.Additional commands¶
How to change a users group¶
Change their group via pam
First find their new group number. Then change the user to the new group.
cluster group list
cluster user change-group --help
# Changes opc to cais group
cluster user change-group opc 10011
Then update it on slurm. You update it by deleting them and readding them to the new group.
sudo sacctmgr remove user where user=opc and account=test
sudo sacctmgr add user name=opc DefaultAccount=cais
Check the results with
Cluster command¶
For more information about the cluster command: