Setting up NVIDIA docker & podman (Ubuntu 24.04, 20241118)

Instructions for a Linux host running Ubuntu 24.04 to install the Nvidia runtime for docker and podman.

Apr 24, 2024
đź’ˇ
Set up instructions for NVIDIA GPU container toolkits on a Linux host running Ubuntu 24.04, which can be used with docker and podman.
 
Revision: 20241118-0 (init: 20240224)
 
The NVIDIA GPU Container Runtime plugin enables container platforms to securely access and manage NVIDIA GPUs in a containerized application environment. Docker is an open-source platform that automates applications' deployment, scaling, and management within lightweight, portable containers. Podman is an open-source, daemonless container engine designed for developing, managing, and running OCI Containers. It functions as a drop-in replacement for Docker.
 
 
Instructions for a Linux host running Ubuntu 24.04 to install the Nvidia runtime for docker and podman. We note that NVIDIA’s Container Toolkit officially only supports Ubuntu LTS release.

Preamble

The following are only required if you do not already have some of the tools installed.

Confirming the Nvidia driver is available

The rest of this guide expects an already functional nvidia-driver.
To install it :
  • On Ubuntu Desktop, install from Software & Updates’s Additional Drivers and reboot.
  • On Ubuntu Server, confirm the device is available using ubuntu-drivers devices and install the recommended “server” driver: sudo apt install nvidia-driver-535-server, then reboot.
    • if you have an aplay error, you can sudo apt-get install alsa-utils
To confirm it is functional, after a reboot, run nvidia-smi from a terminal; if a valid prompt shows up, you will have information on the Driver Version and the supported CUDA Version for future running GPU-enabled containers.

Using a more recent driver

When writing this section (late June 2024), Ubuntu 24.04 uses driver 535 as its recommended driver.
As a user of the CTPO: CUDA + TensorFlow + PyTorch + OpenCV Docker container whose latest version uses CUDA 12.3.2, nvidia-smi tells me 535 supports up to CUDA 12.2, which requires me to use a more recent version of the driver. You can find the list of Nvidia drivers from https://www.nvidia.com/en-us/drivers/unix/
For the rest of this section, we will install driver 550 on a Ubuntu 24.04 Desktop.
First, look at the options listed on Ubuntu’s page at https://ubuntu.com/server/docs/nvidia-drivers-installation
If sudo ubuntu-drivers list offers driver 550 in the provided list, you can perform a sudo ubuntu-drivers install nvidia:550 and ignore the rest of this section.
Otherwise, we will follow the method listed in the “Manual driver installation (using APT)” section.
  • Check which driver is currently in use:
apt list --installed | grep nvidia | grep modules
Among the options presented, the linux-modules-nvidia-535-generic-hwe-24.04 matches the expected linux-modules-nvidia-${DRIVER_BRANCH}${SERVER}-${LINUX_FLAVOUR} format.
  • Seek for an available package that matches this format for 550:
apt search nvidia | grep 550 | grep modules | grep generic-hwe
  • This gives us multiple options. The one that matches my use case (not a server, for example) is linux-modules-nvidia-550-generic-hwe-24.04, which we will install:
sudo apt install linux-modules-nvidia-550-generic-hwe-24.04
  • You can then check that the modules for your kernel were installed by running:
sudo apt-cache policy linux-modules-nvidia-550-$(uname -r)
  • The Installed and Candidate versions should match. You can also test reinstallation using sudo apt install linux-modules-nvidia-550-$(uname -r)
  • After installing the kernel modules, install the driver meta-package which will install the required additional packages for the expected kernel:
sudo apt install nvidia-driver-550
  • A reboot is required to load the new driver.
  • After reboot, you will be able to confirm that the driver is running by using nvidia-smi which in this case now returns:
+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+
, confirming driver 550 is loaded, supporting up to CUDA 12.4 in Docker containers.

Using an even more recent driver

In some cases, the latest official driver provider is not recent enough to support more recent CUDA version. In such cases, it is possible to add a Personal Package Archive (PPA) from the “Graphics Drivers” team to the list of package sources. To add it:
sudo add-apt-repository ppa:graphics-drivers/ppa sudo apt update
At this point, we can start the “Additional Drivers” application and select the driver to install (here we will select Using NVIDIA driver metapackage From nvidia-driver-560 (proprietary)) and start the installation process. After the installation is completed, a reboot is required. At next login, we can confirm the driver and its capabilities using nvidia-smi:
+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 | |-----------------------------------------+------------------------+----------------------+

Docker setup (from docker.io)

We will follow the instructions to set it up using the apt registry option. Details can be found at https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository.
On a Ubuntu 24.04 system from a terminal, clean up potential conflicting packages:
for pkg in docker.io docker-doc docker-compose docker-compose-v2 podman-docker containerd runc; do sudo apt-get remove $pkg; done
Add support for GPG keys, load Docker’s key, and set up the repository:
# Add Docker's official GPG key: sudo apt-get update sudo apt-get install ca-certificates curl sudo install -m 0755 -d /etc/apt/keyrings sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc sudo chmod a+r /etc/apt/keyrings/docker.asc # Add the repository to Apt sources: echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \ $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null sudo apt-get update
Install the required packages:
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
Confirm docker is functional by checking if we get the Hello from Docker! message running its hello-world:
sudo docker run --rm hello-world
Optionally, make docker available without sudo, which has some security implications, as detailed in https://docs.docker.com/engine/install/linux-postinstall/.
You will need to log out entirely before the changes take effect. Once this is done, you should be able to run docker run hello-world without the need for a sudo:
sudo usermod -aG docker $USER

Install podman

On Ubuntu 24.04, apt search podman returns versions above 4.1.0, the minimum required to use the Container Device Interface (CDI) for nvidia-container-toolkit.
It is, therefore, possible to install podman by simply
sudo apt install podman
Now we can test podman:
podman run hello-world
podman runs similarly to docker; for example:
podman run --rm -it docker.io/ubuntu:24.04 /bin/bash
will download ubuntu:24.04, give you a bash shell prompt in an interactive session, and will delete the created container when you exit the shell.
 
It is recommended that you always use a fully qualified image name, including the registry server (full DNS name), namespace, image name, and tag, such as docker.io/ubuntu:24.04.
To add docker.io to the list of “unqualified search registries,” edit/etc/containers/registries.conf and modify the following line as follows: unqualified-search-registries=["docker.io"]—more details on that topic at https://podman.io/docs/installation#registriesconf.
 
Contrary to docker, podman does not create iptables configurations or use br_netfilter, which allows for the use of bridged VMs. In such cases, only install podman and also install podman-compose to get access to podman-compose. Usepodman-compose if you want to use a tool like Dockge; but we also recommend seeing this PR.

NVIDIA Container Toolkit

For further details on what this supports, NVIDIA has an excellent primer document at https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/overview.html, and detailed instructions are available at https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html. On this page, you will find details on the following:
  • supported Linux distributions
  • NVIDIA driver requirements and minimal hardware supported
  • Docker versions
The NVIDIA Container Toolkit also supports generating a Container Device Interface (CDI) for podman.
Setup the package repository and the GPG keys:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
Install the toolkit:
sudo apt-get update sudo apt-get install -y nvidia-container-toolkit

For Docker

Configure docker to recognize the toolkit:
sudo nvidia-ctk runtime configure --runtime=docker
Restart docker
sudo systemctl restart docker
Confirm docker (no sudo needed if you made the optional step in the last section) sees any GPU that you have running on your system by having it run nvidia-smi. Note that docker will need both --runtime=nvidia and --gpus all to use the proper runtime and have access to all the GPUs
docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
Please be aware that the max CUDA version returned by the nvidia-smi command on your host (without docker) will inform you of the max cuda:version image that you can use.
 
You can inspect your /etc/docker/daemon.json file to see that the nvidia-container-runtime is added:
[...] "runtimes": { "nvidia": { "args": [], "path": "nvidia-container-runtime" } }
To make this runtime the default, add the following content to the top of the file "default-runtime": "nvidia", (after the first {) and sudo systemctl restart docker. You should not have to add --runtime=nvidia to the CLI anymore.

For Podman

For podman we will base the instructions for Container Device Interface (CDI) setup on https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html.
Verify that the required tool is installed:
nvidia-ctk --version
, in this run, we have NVIDIA Container Toolkit CLI version 1.15.0
If successful, have it generate the CDI specifications
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
, at the end of this run, we have INFO[0001] Generated CDI spec with version 0.5.0
Confirm devices show up (might be 0, GPU-ID and all)
grep " name:" /etc/cdi/nvidia.yaml
Test podman (no sudo needed) using the proper devices --device nvidia.com/gpu=all
podman run --rm --device nvidia.com/gpu=all ubuntu nvidia-smi

Revision History

  • 20241118-0: Added details to use later version of the Nvidia driver 650 (to support up to CUDA 12.6 in containers)
  • 20240626-0: Added content to update Nvidia driver from 535 to 550 (from supporting CUDA 12.2 to 12.4 in containers)
  • 20240526-0: Added a note about using podman only on hosts where you intend to use bridged VMs.
  • 20240523-0: Updated content for Ubuntu 24.04
  • 20240225-0: Initial release (for Ubuntu 22.04)
Â