NVIDIA GPU Drivers Installation on Linux

Installing NVIDIA GPU drivers correctly on Linux is a prerequisite for GPU-accelerated compute workloads including AI inference, CUDA development, and machine learning training. This guide covers selecting the right driver version, installing via package manager with DKMS support for kernel upgrades, verifying the installation, and resolving common driver conflicts.

Prerequisites

  • A server with an NVIDIA GPU (data center: A100, H100, L40, RTX series; consumer: RTX 20/30/40 series)
  • Ubuntu 20.04/22.04/24.04 or CentOS/Rocky Linux 8/9
  • Kernel headers matching your running kernel
  • Sudo or root access
  • Internet access for package downloads

Checking GPU and System Information

# Identify the installed GPU
lspci | grep -i nvidia
# Output: 01:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090]

# Check current kernel version
uname -r

# Check if nouveau (open-source) driver is loaded
lsmod | grep nouveau

# Check if any NVIDIA drivers are already installed
dpkg -l | grep -i nvidia   # Ubuntu/Debian
rpm -qa | grep -i nvidia   # CentOS/Rocky

Choosing the Right Driver Version

NVIDIA provides three types of drivers:

TypeBest For
Production Branch (e.g., 535, 545)Stable server workloads
New Feature Branch (e.g., 550)Latest features, newer GPUs
Data Center Branch (e.g., 550-server)A100, H100, data center GPUs

Check supported driver versions for your GPU at NVIDIA Driver Downloads.

For data center/AI workloads, use the latest supported production branch. For newer consumer GPUs (RTX 40 series), use the latest new feature branch.

Installing on Ubuntu/Debian

# Update package lists
sudo apt-get update

# Check available NVIDIA driver versions
ubuntu-drivers devices
# Or: apt-cache search nvidia-driver

# Install the recommended driver automatically
sudo ubuntu-drivers autoinstall

# Or install a specific version
sudo apt-get install -y nvidia-driver-550

# Install CUDA-related packages if needed
sudo apt-get install -y nvidia-cuda-toolkit

Method 2: NVIDIA Official Repository (More Control)

# Install prerequisites
sudo apt-get install -y linux-headers-$(uname -r) \
  software-properties-common

# Add NVIDIA repository
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update

# Install the driver only (without full CUDA toolkit)
sudo apt-get install -y cuda-drivers

# Or install a specific driver version
sudo apt-get install -y nvidia-driver-550-server  # For data center GPUs

Blacklist the Nouveau Driver

# The installer usually does this automatically, but verify:
cat /etc/modprobe.d/blacklist-nouveau.conf
# Should contain:
# blacklist nouveau
# options nouveau modeset=0

# If missing, create it:
sudo tee /etc/modprobe.d/blacklist-nouveau.conf << 'EOF'
blacklist nouveau
options nouveau modeset=0
EOF

sudo update-initramfs -u

Reboot

sudo reboot

Installing on CentOS/Rocky Linux

# Install EPEL and development tools
sudo dnf install -y epel-release
sudo dnf groupinstall -y "Development Tools"
sudo dnf install -y kernel-devel kernel-headers dkms

# Add NVIDIA repository
sudo dnf config-manager --add-repo \
  https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo

sudo dnf clean all
sudo dnf makecache

# Install CUDA drivers (includes NVIDIA driver)
sudo dnf install -y cuda-drivers

# Disable Secure Boot if needed (DKMS requires signing on UEFI systems)
# or enroll the DKMS key in MOK

sudo reboot

Verifying the Installation

After rebooting:

# Check if the NVIDIA kernel module is loaded
lsmod | grep nvidia

# Query GPU status with nvidia-smi
nvidia-smi

# Expected output:
# +-----------------------------------------------------------------------------+
# | NVIDIA-SMI 550.xx.xx    Driver Version: 550.xx.xx    CUDA Version: 12.4    |
# |-------------------------------+----------------------+----------------------+
# | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC|
# | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
# |===============================+======================+======================|
# |   0  NVIDIA RTX 3090 Off  | 00000000:01:00.0 Off |                  N/A |
# |  0%   35C    P8    20W / 350W |      0MiB / 24576MiB |      0%      Default |
# +-----------------------------------------------------------------------------+

# Monitor GPU utilization in real time
nvidia-smi dmon -s u  # GPU utilization
nvidia-smi -l 1       # Refresh every second

# List all GPU processes
nvidia-smi pmon

# Enable persistence mode (reduces driver latency on first use)
sudo nvidia-smi -pm 1

DKMS for Kernel Upgrades

DKMS (Dynamic Kernel Module Support) automatically recompiles the NVIDIA kernel module when the kernel is updated — without this, driver reinstallation would be required after every kernel upgrade.

# Install DKMS
sudo apt-get install -y dkms  # Ubuntu/Debian
sudo dnf install -y dkms      # CentOS/Rocky

# Verify DKMS has built the NVIDIA module
dkms status
# Output: nvidia/550.xx.xx, X.X.X-XX-generic, x86_64: installed

# If DKMS module isn't installed, add it manually
sudo dkms add -m nvidia -v $(nvidia-smi --query-gpu=driver_version --format=csv,noheader)
sudo dkms build -m nvidia -v DRIVER_VERSION
sudo dkms install -m nvidia -v DRIVER_VERSION

After a kernel upgrade, DKMS runs automatically during the boot process. Verify after rebooting into the new kernel:

uname -r   # Confirm new kernel
nvidia-smi  # Confirm driver still works

Troubleshooting

nvidia-smi not found after reboot

# Check if the module loaded
lsmod | grep nvidia

# Check kernel messages for errors
dmesg | grep -i nvidia
journalctl -k | grep -i nvidia

# Try loading the module manually
sudo modprobe nvidia

# If it fails with "not found", reinstall
sudo apt-get install --reinstall nvidia-driver-550
sudo reboot

Nouveau driver conflict

# Verify nouveau is blacklisted
lsmod | grep nouveau  # Should return nothing

# Force blacklist and rebuild initramfs
sudo bash -c "echo 'blacklist nouveau' > /etc/modprobe.d/blacklist-nouveau.conf"
sudo update-initramfs -u   # Ubuntu
sudo dracut --force        # CentOS/Rocky
sudo reboot

"Failed to initialize NVML" error

# Usually a permissions issue — run as root or check udev rules
sudo nvidia-smi

# Or add your user to the video/render group
sudo usermod -aG video $USER
sudo usermod -aG render $USER

Driver version mismatch after kernel update

# Force DKMS rebuild
sudo dkms autoinstall

# Or reinstall the driver package
sudo apt-get install --reinstall nvidia-driver-550
sudo reboot

Secure Boot prevents module loading

# On UEFI systems, sign the kernel module or disable Secure Boot in BIOS
# Ubuntu: MOK key enrollment during installation
sudo mokutil --import /var/lib/shim-signed/mok/MOK.der

Conclusion

Installing NVIDIA drivers with DKMS support ensures your GPU remains available after kernel updates without manual intervention, which is critical for production AI and compute workloads on Linux servers. Always verify the installation with nvidia-smi after rebooting and enable persistence mode to minimize GPU initialization latency for your workloads.