CUDA Toolkit Installation and Configuration

The NVIDIA CUDA Toolkit provides the compiler (nvcc), libraries, and runtime needed for GPU-accelerated computing on Linux, and is a prerequisite for deep learning frameworks like TensorFlow and PyTorch. This guide covers selecting the correct CUDA version for your GPU and driver, installing the toolkit, configuring environment variables, managing multiple CUDA versions, and verifying the setup.

Prerequisites

  • NVIDIA GPU drivers already installed (nvidia-smi works)
  • Ubuntu 20.04/22.04/24.04 or CentOS/Rocky Linux 8/9
  • At least 4GB free disk space for the toolkit
  • Sudo access

Choosing the Right CUDA Version

CUDA requires a minimum driver version. Check compatibility:

CUDA VersionMin Driver (Linux)Common Use Case
CUDA 12.4550.54.14Latest TF/PyTorch, Hopper GPUs
CUDA 12.1530.30.02Stable for most workloads
CUDA 11.8520.61.05Legacy TF 2.x support

Check your driver version first:

nvidia-smi | grep "Driver Version"

For deep learning, use the CUDA version specified in the TensorFlow/PyTorch compatibility matrix:

Installing CUDA on Ubuntu/Debian

# Download and install the CUDA keyring
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update

# Install a specific CUDA toolkit version (without upgrading the driver)
sudo apt-get install -y cuda-toolkit-12-4

# To install CUDA with the driver included:
sudo apt-get install -y cuda-12-4

Method 2: Local Installer (Offline)

# Download the local installer for Ubuntu 22.04
wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda_12.4.0_550.54.14_linux.run

# Run the installer
sudo sh cuda_12.4.0_550.54.14_linux.run \
  --silent \
  --toolkit \
  --no-drm    # Skip driver if already installed

Verify Install Directory

ls /usr/local/cuda-12.4/
# bin  extras  include  lib64  nsight-compute-2023.3.1  ...

Installing CUDA on CentOS/Rocky Linux

# Add NVIDIA repository
sudo dnf config-manager --add-repo \
  https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo

sudo dnf clean all
sudo dnf makecache

# Install CUDA toolkit (specific version)
sudo dnf install -y cuda-toolkit-12-4

# Or full CUDA including driver
sudo dnf install -y cuda-12-4

Configuring Environment Variables

CUDA requires PATH and LD_LIBRARY_PATH to be configured. Add these to your shell profile:

# For system-wide configuration (all users)
sudo tee /etc/profile.d/cuda.sh << 'EOF'
export CUDA_HOME=/usr/local/cuda
export PATH="${CUDA_HOME}/bin:${PATH}"
export LD_LIBRARY_PATH="${CUDA_HOME}/lib64:${LD_LIBRARY_PATH}"
EOF

source /etc/profile.d/cuda.sh

For user-specific configuration:

cat >> ~/.bashrc << 'EOF'

# CUDA Toolkit
export CUDA_HOME=/usr/local/cuda
export PATH="${CUDA_HOME}/bin:${PATH}"
export LD_LIBRARY_PATH="${CUDA_HOME}/lib64:${LD_LIBRARY_PATH}"
EOF

source ~/.bashrc

Verify the symlink /usr/local/cuda points to your installed version:

ls -la /usr/local/cuda
# lrwxrwxrwx /usr/local/cuda -> /usr/local/cuda-12.4

# If missing, create it
sudo ln -sf /usr/local/cuda-12.4 /usr/local/cuda

Installing cuDNN

cuDNN (CUDA Deep Neural Network library) significantly accelerates deep learning training and inference.

Via NVIDIA Repository

# After adding the CUDA repo (same keyring)
sudo apt-get install -y libcudnn9-cuda-12   # Ubuntu
sudo dnf install -y cudnn9-cuda-12-4        # CentOS/Rocky

# Verify cuDNN installation
dpkg -l | grep cudnn
# Or:
cat /usr/include/cudnn_version.h | grep CUDNN_MAJOR -A 2

Manual Installation

# Download cuDNN from https://developer.nvidia.com/cudnn (requires NVIDIA account)
# Extract to CUDA directory
tar -xzvf cudnn-linux-x86_64-9.x.x_cuda12-archive.tar.xz

sudo cp cudnn-*-archive/include/cudnn*.h /usr/local/cuda/include/
sudo cp cudnn-*-archive/lib/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*

# Verify
cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2

Managing Multiple CUDA Versions

Install multiple CUDA versions side by side:

# Install multiple versions
sudo apt-get install -y cuda-toolkit-11-8 cuda-toolkit-12-1 cuda-toolkit-12-4

# List installed versions
ls /usr/local/ | grep cuda

# Switch active version by updating the symlink
sudo ln -sfn /usr/local/cuda-12.1 /usr/local/cuda
source ~/.bashrc
nvcc --version  # Confirm version

For project-specific CUDA versions, set the environment per-session:

# Switch to CUDA 11.8 for a specific project
export CUDA_HOME=/usr/local/cuda-11.8
export PATH="${CUDA_HOME}/bin:${PATH}"
export LD_LIBRARY_PATH="${CUDA_HOME}/lib64:${LD_LIBRARY_PATH}"
nvcc --version

Using a wrapper script:

cat > /usr/local/bin/use-cuda << 'EOF'
#!/bin/bash
VERSION=${1:-12.4}
export CUDA_HOME=/usr/local/cuda-${VERSION}
export PATH="${CUDA_HOME}/bin:${PATH}"
export LD_LIBRARY_PATH="${CUDA_HOME}/lib64:${LD_LIBRARY_PATH}"
echo "Switched to CUDA ${VERSION}"
exec "${SHELL}"
EOF
chmod +x /usr/local/bin/use-cuda

# Usage: use-cuda 11.8

Verifying the Installation

# Check CUDA compiler version
nvcc --version
# nvcc: NVIDIA (R) Cuda compiler driver
# Cuda compilation tools, release 12.4, V12.4.99

# Run the deviceQuery sample
cd /usr/local/cuda/extras/demo_suite/
./deviceQuery
# Should show GPU details and "Result = PASS"

# Run a bandwidth test
./bandwidthTest
# Should show memory bandwidth measurements

# Compile and run a simple CUDA program
cat > hello_cuda.cu << 'EOF'
#include <stdio.h>
__global__ void hello() {
    printf("Hello from GPU thread %d!\n", threadIdx.x);
}
int main() {
    hello<<<1, 8>>>();
    cudaDeviceSynchronize();
    return 0;
}
EOF

nvcc hello_cuda.cu -o hello_cuda
./hello_cuda
# Hello from GPU thread 0!
# Hello from GPU thread 1!
# ...

Troubleshooting

"nvcc not found"

# Check if CUDA bin is in PATH
echo $PATH | grep cuda

# Manually add it
export PATH=/usr/local/cuda/bin:$PATH

# Verify the install exists
ls /usr/local/cuda/bin/nvcc

"libcuda.so.1: cannot open shared object file"

# Update the shared library cache
sudo ldconfig

# Verify CUDA libraries are present
ls /usr/local/cuda/lib64/libcuda*

# Check LD_LIBRARY_PATH
echo $LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

CUDA version mismatch with PyTorch/TensorFlow

# Check what CUDA version PyTorch sees
python3 -c "import torch; print(torch.version.cuda)"

# Install the PyTorch version matching your CUDA
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124

"no kernel image is available for execution"

# This means your GPU compute capability is too low for the compiled code
# Check your GPU compute capability
nvidia-smi --query-gpu=compute_cap --format=csv,noheader

# Recompile targeting your GPU architecture
nvcc -arch=sm_75 mycode.cu -o mycode  # sm_75 for Turing (RTX 20xx)

Conclusion

A correctly configured CUDA installation with matching cuDNN and proper environment variables is the foundation for all GPU-accelerated computing workloads on Linux. Using DKMS-linked driver installations combined with multiple CUDA version symlinks gives you flexibility to switch between framework requirements without reinstalling the toolkit each time.