CUDA tests in containers¶
Many applications that make use of AI/LLMs/machine learning or otherwise GPU-heavy tasks rely on NVIDIA® CUDA®, a proprietary technology. Running such apps requires additional runtime resources like GPU hardware, but on the software side, also access to the GPU driver (which may change between runs due to updates etc.).
In this tutorial, we show how to run the SAXPY tool which is already available in nixpkgs via pkgs.cudaPackages.saxpy.
This application does not do much, but running it without errors is a good indicator that CUDA generally works.
Special setup necessary
- Test driver container setup: see Feature: containers
- Sandbox setup: see host configuration section below
- Test driver patch: see driver patching below
Run this example test yourself
To run this test directly from the example repository, run:
Let's go through the implementation.
The test generally runs on both NVIDIA and AMD hardware, which both need extra settings.
The common part is in cuda/generic.nix:
{
name = "saxpy-cuda-test";
containers.container = # (1)
{ pkgs, ... }:
{
environment.systemPackages = [
pkgs.cudaPackages.saxpy # (2)
];
# Misc cuda mounts.
# This was generated by running:
# nix run nixpkgs#nix-required-mounts $(nix-instantiate -E 'derivation {name = "needs-cuda"; builder = "-"; system = "-"; requiredSystemFeatures = ["cuda"]; }') | grep -v '^/nix/store'warning
virtualisation.systemd-nspawn.options = [
# (3)
"--bind=/host/run/opengl-driver:/run/opengl-driver"
"--bind=/dev/dri:/dev/dri"
# on AMD/NVIDIA, there's more paths.
# see also cuda/amd.nix
];
};
# (4)
testScript = /* python */ ''
container.start()
output = container.succeed("saxpy 2>&1")
assert "Max error: 0.000000" in output
print("CUDA output:")
print(output)
'';
}
-
Container support
CUDA can at this point only be tested in the test driver's containers.
It is generally possible to provide direct access to the GPU via PCI-passthrough. This would however make GPU access exclusive to the test machine.
-
We install
saxpyinto the container -
Extra container paths
The Nix daemon already added these paths into the Nix sandbox (see host configuration). From here, we need to pass these paths another time from inside the Nix sandbox further into the new systemd-nspawn container.
-
The test simply runs
saxpy.If this does not crash, we know that CUDA generally works.
The cuda/generic.nix test module does not list all necessary paths, yet.
In the nvidia.nix and amd.nix test modules, we include generic.nix to inherit all generic parts of the test and add the rest of the hardware specific paths:
{
imports = [ ./generic.nix ];
containers.container = {
virtualisation.systemd-nspawn.options = [
"--bind=/dev/nvidia-modeset:/dev/nvidia-modeset"
"--bind=/dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools"
"--bind=/dev/nvidiactl:/dev/nvidiactl"
"--bind=/dev/nvidia-uvm:/dev/nvidia-uvm"
"--bind=/dev/nvidia0:/dev/nvidia0"
"--bind=/dev/nvidiactl:/dev/nvidiactl"
];
};
}
Where to the bind mount paths come from?
After configuring the nix daemon properly (as described below, we can use the nix-required-mounts app to print all paths that are deemed necessary in the Nix sandbox for running CUDA applications inside:
$ nix run nixpkgs#nix-required-mounts $(nix-instantiate -E 'derivation {name = "needs-cuda"; builder = "-"; system = "-"; requiredSystemFeatures = ["cuda"]; }') | grep -v '^/nix/store'warning
/dev/dri=/dev/dri
/dev/nvidia-modeset=/dev/nvidia-modeset
/dev/nvidia-uvm=/dev/nvidia-uvm
/dev/nvidia-uvm-tools=/dev/nvidia-uvm-tools
/dev/nvidia0=/dev/nvidia0
/dev/nvidiactl=/dev/nvidiactl
/nix/store/f2il677hxc4lf41ayhckakdmgngidb40-graphics-drivers=/nix/store/f2il677hxc4lf41ayhckakdmgngidb40-graphics-drivers
/nix/store/ma7icdpq0134vwxk9x3rc93ssa4m6xc1-nvidia-x11-595.58.03-6.19.11/etc=/nix/store/ma7icdpq0134vwxk9x3rc93ssa4m6xc1-nvidia-x11-595.58.03-6.19.11/etc
/run/opengl-driver=/run/opengl-driver
We can ignore the /nix/store/... paths from that output as these are automatically generated by the app.
To learn more about module composition, have a look at the feature page about test modules
This is the test in action:
Necessary configuration¶
To run CUDA inside the test driver, we must:
- Configure CUDA properly
- Configure the sandbox to provide all necessary hardware-related paths
- Patch the test driver to pass these paths further from inside the sandbox into the container
Host configuration¶
On the host, the GPU driver must be configured properly, as well as the nix-required-mounts pre-build-hook.
What does the nix-required-mounts pre-build-hook do?
CUDA applications require many host-specific paths that provide access to the GPU and the currently loaded driver.
The program.nix-required-mounts.* settings shown below add a pre-build-hook line to the system's /etc/nix/nix.conf configuration file.
This way, the Nix daemon is instructed to call nix-required-mounts with the derivation path to be built.
The app returns the list of paths that need to be accessible in the sandbox to run CUDA inside.
{
# === sandbox settings ====
programs.nix-required-mounts = {
enable = true;
presets.nvidia-gpu.enable = true;
};
# === General GPU/NVIDIA NixOS settings ===
hardware.graphics.enable = true;
# ensure proprietary driver
boot.blacklistedKernelModules = [ "nouveau" ];
services.xserver.videoDrivers = [ "nvidia" ];
# ensure proprietary and performance settings and latest driver
boot.kernelPackages = pkgs.linuxPackages_latest;
hardware.nvidia = {
modesetting.enable = true;
powerManagement.enable = false;
powerManagement.finegrained = false;
open = false;
nvidiaSettings = true;
package = config.boot.kernelPackages.nvidiaPackages.latest;
};
}
Hosts with AMD GPUs need ZLUDA support and can then run the test out of the box.
These configuration options have just been upstreamed
If the attribute programs.nix-required-mounts.presets.zluda.enable or hardware.amdgpu.zluda.enable don't exist in your NixOS version, please upgrade to the latest nixpkgs.
(This change was recently merged with nixpkgs PR: #501095 programs.nix-required-mounts.presets.zluda.enable: init)
Make sure to rebuild your NixOS configuration with these settings.
Test driver patch¶
Currently, the test driver does not set up the systemd-nspawn containers in a way that they would have access to the host's /run folder, although this is necessary to provide access to the GPU driver.
Before this is upstreamed in a more streamlined way, it is possible to provide that access via a small overlay file:
_: prev: {
nixos-test-driver = prev.nixos-test-driver.overrideAttrs (old: {
patches = old.patches or [ ] ++ [ ./nixos-test-driver-gpu.patch ];
});
}
The mentioned patch contains these Python lines:
--- a/test_driver/driver.py
+++ b/test_driver/driver.py
@@ -146,6 +146,9 @@ class Driver:
# set up prerequisites for systemd-nspawn containers.
# these are not guaranteed to be set up in the Nix sandbox.
# if running interactively as root, these will already be set up.
+ if Path("/run").exists():
+ Path("/host/run").mkdir(parents=True)
+ subprocess.run(["mount", "--bind", "/run", "/host/run"], check=True)
# check if /run is writable by root
if not os.access("/run", os.W_OK):
--
2.53.0
Finally, the test derivation needs to communicate to the Nix daemon that it needs CUDA capabilities:
cuda-test = (pkgs.testers.runNixOSTest ./cuda-test.nix).overrideTestDerivation (old: {
requiredSystemFeatures = old.requiredSystemFeatures ++ [ "cuda" ];
});
For the runnable example in this repository, the flake.nix performs these changes like this: