CUDA tests in containers¶

Many applications that make use of AI/LLMs/machine learning or otherwise GPU-heavy tasks rely on NVIDIA® CUDA®, a proprietary technology. Running such apps requires additional runtime resources like GPU hardware, but on the software side, also access to the GPU driver (which may change between runs due to updates etc.).

In this tutorial, we show how to run the SAXPY tool which is already available in nixpkgs via pkgs.cudaPackages.saxpy. This application does not do much, but running it without errors is a good indicator that CUDA generally works.

Special setup necessary

Test driver container setup: see Feature: containers
Sandbox setup: see host configuration section below
Test driver patch: see driver patching below

Run this example test yourself

To run this test directly from the example repository, run:

NVIDIAAMD

nix build -L github:applicative-systems/nixos-test-driver-manual#test-cuda-nvidia

nix build -L github:applicative-systems/nixos-test-driver-manual#test-cuda-amd

Let's go through the implementation.

The test generally runs on both NVIDIA and AMD hardware, which both need extra settings. The common part is in cuda/generic.nix:

cuda/generic.nix

{
  name = "saxpy-cuda-test";

  requiredFeatures.cuda = true; # (1)

  containers.container = # (2)
    { pkgs, ... }:
    {
      environment.systemPackages = [
        pkgs.cudaPackages.saxpy # (3)
      ];

      # Misc cuda mounts.
      # This was generated by running:
      #   nix run nixpkgs#nix-required-mounts $(nix-instantiate -E 'derivation {name = "needs-cuda"; builder = "-"; system = "-"; requiredSystemFeatures = ["cuda"]; }') | grep -v '^/nix/store'warning
      virtualisation.systemd-nspawn.options = [
        # (4)
        "--bind=/host/run/opengl-driver:/run/opengl-driver"
        "--bind=/dev/dri:/dev/dri"

        # on AMD/NVIDIA, there's more paths.
        # see also cuda/amd.nix
      ];
    };

  # (5)
  testScript = /* python */ ''
    container.start()

    output = container.succeed("saxpy 2>&1")
    assert "Max error: 0.000000" in output

    print("CUDA output:")
    print(output)
  '';
}

CUDA builder support

We must signalize to the nix build scheduling that we require a machine with CUDA support.

This is the client-side counter part to the host configuration section below.
Container support

CUDA can at this point only be tested in the test driver's containers.

It is generally possible to provide direct access to the GPU via PCI-passthrough. This would however make GPU access exclusive to the test machine.
We install saxpy into the container
Extra container paths

The Nix daemon already added these paths into the Nix sandbox (see host configuration). From here, we need to pass these paths another time from inside the Nix sandbox further into the new systemd-nspawn container.
The test simply runs saxpy.

If this does not crash, we know that CUDA generally works.

The requiredFeatures test attribute has just been upstreamed

If the attribute requiredFeatures.cuda doesn't exist in your NixOS version, please upgrade to the latest nixpkgs.

(This change was recently merged with nixpkgs PR #511413: nixos-test-driver: Add extra required features)

The cuda/generic.nix test module does not list all necessary paths, yet. In the nvidia.nix and amd.nix test modules, we include generic.nix to inherit all generic parts of the test and add the rest of the hardware specific paths:

NVIDIAAMD

cuda/nvidia.nix

{
  imports = [ ./generic.nix ];

  requiredFeatures.nvidia-gpu = true; # (1)

  containers.container = {
    virtualisation.systemd-nspawn.options = [
      "--bind=/dev/nvidia-modeset:/dev/nvidia-modeset"
      "--bind=/dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools"
      "--bind=/dev/nvidiactl:/dev/nvidiactl"
      "--bind=/dev/nvidia-uvm:/dev/nvidia-uvm"
      "--bind=/dev/nvidia0:/dev/nvidia0"
      "--bind=/dev/nvidiactl:/dev/nvidiactl"
    ];
  };
}

enable nvidia-gpu in addition to cuda (which is set ingeneric.nix) as a required additional feature of the nix builder. This way we can avoid running this test on CUDA enabled AMD machines.

cuda/amd.nix

{
  imports = [ ./generic.nix ];

  requiredFeatures.amd-gpu = true; # (1)

  containers.container = {
    virtualisation.systemd-nspawn.options = [
      "--bind=/dev/kfd:/dev/kfd"
      "--bind=/sys/devices/virtual/kfd:/sys/devices/virtual/kfd"
    ];
  };
}

enable amd-gpu in addition to cuda (which is set ingeneric.nix) as a required additional feature of the nix builder. This way we can avoid running this test on CUDA enabled NVIDIA machines.

Where to the bind mount paths come from?

After configuring the nix daemon properly (as described below, we can use the nix-required-mounts app to print all paths that are deemed necessary in the Nix sandbox for running CUDA applications inside:

$ nix run nixpkgs#nix-required-mounts $(nix-instantiate -E 'derivation {name = "needs-cuda"; builder = "-"; system = "-"; requiredSystemFeatures = ["cuda"]; }') | grep -v '^/nix/store'warning
/dev/dri=/dev/dri
/dev/nvidia-modeset=/dev/nvidia-modeset
/dev/nvidia-uvm=/dev/nvidia-uvm
/dev/nvidia-uvm-tools=/dev/nvidia-uvm-tools
/dev/nvidia0=/dev/nvidia0
/dev/nvidiactl=/dev/nvidiactl
/nix/store/f2il677hxc4lf41ayhckakdmgngidb40-graphics-drivers=/nix/store/f2il677hxc4lf41ayhckakdmgngidb40-graphics-drivers
/nix/store/ma7icdpq0134vwxk9x3rc93ssa4m6xc1-nvidia-x11-595.58.03-6.19.11/etc=/nix/store/ma7icdpq0134vwxk9x3rc93ssa4m6xc1-nvidia-x11-595.58.03-6.19.11/etc
/run/opengl-driver=/run/opengl-driver

We can ignore the /nix/store/... paths from that output as these are automatically generated by the app.

To learn more about module composition, have a look at the feature page about test modules

This is the test in action:

CUDA in the test driver demo — CUDA saxpy example app output in a test driver container

Necessary configuration¶

To run CUDA inside the test driver, we must:

Configure CUDA properly
Configure the sandbox to provide all necessary hardware-related paths
Patch the test driver to pass these paths further from inside the sandbox into the container

Host configuration¶

On the host, the GPU driver must be configured properly, as well as the nix-required-mounts pre-build-hook.

What does the nix-required-mounts pre-build-hook do?

CUDA applications require many host-specific paths that provide access to the GPU and the currently loaded driver.

The program.nix-required-mounts.* settings shown below add a pre-build-hook line to the system's /etc/nix/nix.conf configuration file.

This way, the Nix daemon is instructed to call nix-required-mounts with the derivation path to be built. The app returns the list of paths that need to be accessible in the sandbox to run CUDA inside.

NVIDIAAMD

configuration.nix

{
  # === sandbox settings ====
  programs.nix-required-mounts = {
    enable = true;
    presets.nvidia-gpu.enable = true; # (1)
  };

  # === General GPU/NVIDIA NixOS settings ===
  hardware.graphics.enable = true;

  # ensure proprietary driver
  boot.blacklistedKernelModules = [ "nouveau" ];
  services.xserver.videoDrivers = [ "nvidia" ];

  # ensure proprietary and performance settings and latest driver
  boot.kernelPackages = pkgs.linuxPackages_latest;
  hardware.nvidia = {
    modesetting.enable = true;
    powerManagement.enable = false;
    powerManagement.finegrained = false;
    open = false;
    nvidiaSettings = true;
    package = config.boot.kernelPackages.nvidiaPackages.latest;
  };
}

This entry sets the "cuda" feature in the builder as requested by the test in requiredFeatures.cuda = true;

Hosts with AMD GPUs need ZLUDA support and can then run the test out of the box.

These configuration options have just been upstreamed

If the attribute programs.nix-required-mounts.presets.zluda.enable or hardware.amdgpu.zluda.enable don't exist in your NixOS version, please upgrade to the latest nixpkgs.

(This change was recently merged with nixpkgs PR: #501095 programs.nix-required-mounts.presets.zluda.enable: init)

configuration.nix

{
  # === sandbox settings ===
  programs.nix-required-mounts = {
    enable = true;
    presets.zluda.enable = true; # (1)
  };

  # === General GPU/AMD NixOS settings ===
  hardware.graphics.enable = true;
  hardware.amdgpu.zluda.enable = true;
}

This entry sets the "cuda" feature in the builder as requested by the test in requiredFeatures.cuda = true;

Make sure to rebuild your NixOS configuration with these settings.

Test driver patch¶

Currently, the test driver does not set up the systemd-nspawn containers in a way that they would have access to the host's /run folder, although this is necessary to provide access to the GPU driver.

Before this is upstreamed in a more streamlined way, it is possible to provide that access via a small overlay file:

cuda/overlay.nix

_: prev: {
  nixos-test-driver = prev.nixos-test-driver.overrideAttrs (old: {
    patches = old.patches or [ ] ++ [ ./nixos-test-driver-gpu.patch ];
  });
}

The mentioned patch contains these Python lines:

cuda/nixos-test-driver-gpu.patch

--- a/test_driver/driver.py
+++ b/test_driver/driver.py
@@ -146,6 +146,9 @@ class Driver:
         # set up prerequisites for systemd-nspawn containers.
         # these are not guaranteed to be set up in the Nix sandbox.
         # if running interactively as root, these will already be set up.
+        if Path("/run").exists():
+            Path("/host/run").mkdir(parents=True)
+            subprocess.run(["mount", "--bind", "/run", "/host/run"], check=True)

         # check if /run is writable by root
         if not os.access("/run", os.W_OK):
--
2.53.0

For the runnable example in this repository, the flake.nix performs the overlay injection like this:

overlay injection