NVIDIA¶
How to upgrade NVIDIA CUDA toolkit and drivers¶
Uninstallation
Before you install a newer CUDA toolkit or driver version we should uninstall the old version to avoid conflicts and reclaim boot volume space.
# To remove CUDA Toolkit:
sudo yum remove "cuda*" "*cublas*" "*cufft*" "*cufile*" "*curand*" \
"*cusolver*" "*cusparse*" "*gds-tools*" "*npp*" "*nvjpeg*" "nsight*" "*nvvm*"
# To remove NVIDIA Drivers:
sudo yum remove "*nvidia*"
# Remove any remaining CUDA directories
sudo rm -rf /usr/local/cuda*
Installation: Latest
Below are links for the NVIDIA CUDA toolkit and drivers. Select the options that are relevant for our clusterc.
Once you've filled out the options you'll be provided with instructions to download and install the CUDA toolkit and drivers.
CUDA: https://developer.nvidia.com/cuda-downloads
Recommended options:
- Operating System: Linux
- Architecture: x86_64
- Distribution: RHEL
- Version: 7
- Installer Type: rpm (network)
With the options above, you will be provided with commands to install the latest version of the CUDA Toolkit and Drivers.
Installation: Old At the moment, we are running older versions on our cluster. The commands to install the older version are provided below.
# Install CUDA Toolkit 12.3
sudo yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
sudo yum clean all
sudo yum -y install cuda-toolkit-12-3
# Install Driver 545.23.06
version=545.23.06;
kernel_stream="open-dkms";
stream="latest-dkms";
list=("kmod-nvidia-$kernel_stream-$version")
list+=("nvidia-driver-$stream-cuda-$version")
list+=("nvidia-driver-$stream-cuda-libs-$version")
list+=("nvidia-driver-$stream-devel-$version")
list+=("nvidia-driver-$stream-$version")
list+=("nvidia-driver-$stream-NVML-$version")
list+=("nvidia-driver-$stream-NvFBCOpenGL-$version")
list+=("nvidia-driver-$stream-libs-$version")
list+=("nvidia-libXNVCtrl-$version")
list+=("nvidia-libXNVCtrl-devel-$version")
list+=("nvidia-modprobe-$stream-$version")
list+=("nvidia-persistenced-$stream-$version")
list+=("nvidia-settings-$version")
list+=("nvidia-xconfig-$stream-$version")
sudo yum swap kmod-nvidia-latest-dkms kmod-nvidia-open-dkms-$version
sudo yum --setopt=obsoletes=0 install ${list[@]}
sudo yum -y install cuda*545-$version