Skip to content

CUDA tests in containers

Many applications that make use of AI/LLMs/machine learning or otherwise GPU-heavy tasks rely on NVIDIA® CUDA®, a proprietary technology. Running such apps requires additional runtime resources like GPU hardware, but on the software side, also access to the GPU driver (which may change between runs due to updates etc.).

In this tutorial, we show how to run the SAXPY tool which is already available in nixpkgs via pkgs.cudaPackages.saxpy. This application does not do much, but running it without errors is a good indicator that CUDA generally works.

Special setup necessary

  1. Test driver container setup: see Feature: containers
  2. Sandbox setup: see host configuration section below
  3. Test driver patch: see driver patching below

Run this example test yourself

To run this test directly from the example repository, run:

nix build -L github:applicative-systems/nixos-test-driver-manual#test-cuda-nvidia
nix build -L github:applicative-systems/nixos-test-driver-manual#test-cuda-amd

Let's go through the implementation.

The test generally runs on both NVIDIA and AMD hardware, which both need extra settings. The common part is in cuda/generic.nix:

cuda/generic.nix
{
  name = "saxpy-cuda-test";

  containers.container = # (1)
    { pkgs, ... }:
    {
      environment.systemPackages = [
        pkgs.cudaPackages.saxpy # (2)
      ];

      # Misc cuda mounts.
      # This was generated by running:
      #   nix run nixpkgs#nix-required-mounts $(nix-instantiate -E 'derivation {name = "needs-cuda"; builder = "-"; system = "-"; requiredSystemFeatures = ["cuda"]; }') | grep -v '^/nix/store'warning
      virtualisation.systemd-nspawn.options = [
        # (3)
        "--bind=/host/run/opengl-driver:/run/opengl-driver"
        "--bind=/dev/dri:/dev/dri"

        # on AMD/NVIDIA, there's more paths.
        # see also cuda/amd.nix
      ];
    };

  # (4)
  testScript = /* python */ ''
    container.start()

    output = container.succeed("saxpy 2>&1")
    assert "Max error: 0.000000" in output

    print("CUDA output:")
    print(output)
  '';
}
  1. Container support

    CUDA can at this point only be tested in the test driver's containers.

    It is generally possible to provide direct access to the GPU via PCI-passthrough. This would however make GPU access exclusive to the test machine.

  2. We install saxpy into the container

  3. Extra container paths

    The Nix daemon already added these paths into the Nix sandbox (see host configuration). From here, we need to pass these paths another time from inside the Nix sandbox further into the new systemd-nspawn container.

  4. The test simply runs saxpy.

    If this does not crash, we know that CUDA generally works.

The cuda/generic.nix test module does not list all necessary paths, yet. In the nvidia.nix and amd.nix test modules, we include generic.nix to inherit all generic parts of the test and add the rest of the hardware specific paths:

cuda/nvidia.nix
{
  imports = [ ./generic.nix ];

  containers.container = {
    virtualisation.systemd-nspawn.options = [
      "--bind=/dev/nvidia-modeset:/dev/nvidia-modeset"
      "--bind=/dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools"
      "--bind=/dev/nvidiactl:/dev/nvidiactl"
      "--bind=/dev/nvidia-uvm:/dev/nvidia-uvm"
      "--bind=/dev/nvidia0:/dev/nvidia0"
      "--bind=/dev/nvidiactl:/dev/nvidiactl"
    ];
  };
}
cuda/amd.nix
{
  imports = [ ./generic.nix ];

  containers.container = {
    virtualisation.systemd-nspawn.options = [
      "--bind=/dev/kfd:/dev/kfd"
      "--bind=/sys/devices/virtual/kfd:/sys/devices/virtual/kfd"
    ];
  };
}
Where to the bind mount paths come from?

After configuring the nix daemon properly (as described below, we can use the nix-required-mounts app to print all paths that are deemed necessary in the Nix sandbox for running CUDA applications inside:

$ nix run nixpkgs#nix-required-mounts $(nix-instantiate -E 'derivation {name = "needs-cuda"; builder = "-"; system = "-"; requiredSystemFeatures = ["cuda"]; }') | grep -v '^/nix/store'warning
/dev/dri=/dev/dri
/dev/nvidia-modeset=/dev/nvidia-modeset
/dev/nvidia-uvm=/dev/nvidia-uvm
/dev/nvidia-uvm-tools=/dev/nvidia-uvm-tools
/dev/nvidia0=/dev/nvidia0
/dev/nvidiactl=/dev/nvidiactl
/nix/store/f2il677hxc4lf41ayhckakdmgngidb40-graphics-drivers=/nix/store/f2il677hxc4lf41ayhckakdmgngidb40-graphics-drivers
/nix/store/ma7icdpq0134vwxk9x3rc93ssa4m6xc1-nvidia-x11-595.58.03-6.19.11/etc=/nix/store/ma7icdpq0134vwxk9x3rc93ssa4m6xc1-nvidia-x11-595.58.03-6.19.11/etc
/run/opengl-driver=/run/opengl-driver

We can ignore the /nix/store/... paths from that output as these are automatically generated by the app.

To learn more about module composition, have a look at the feature page about test modules

This is the test in action:

CUDA in the test driver demo
CUDA saxpy example app output in a test driver container

Necessary configuration

To run CUDA inside the test driver, we must:

  1. Configure CUDA properly
  2. Configure the sandbox to provide all necessary hardware-related paths
  3. Patch the test driver to pass these paths further from inside the sandbox into the container

Host configuration

On the host, the GPU driver must be configured properly, as well as the nix-required-mounts pre-build-hook.

What does the nix-required-mounts pre-build-hook do?

CUDA applications require many host-specific paths that provide access to the GPU and the currently loaded driver.

The program.nix-required-mounts.* settings shown below add a pre-build-hook line to the system's /etc/nix/nix.conf configuration file.

This way, the Nix daemon is instructed to call nix-required-mounts with the derivation path to be built. The app returns the list of paths that need to be accessible in the sandbox to run CUDA inside.

configuration.nix
{
  # === sandbox settings ====
  programs.nix-required-mounts = {
    enable = true;
    presets.nvidia-gpu.enable = true;
  };

  # === General GPU/NVIDIA NixOS settings ===
  hardware.graphics.enable = true;

  # ensure proprietary driver
  boot.blacklistedKernelModules = [ "nouveau" ];
  services.xserver.videoDrivers = [ "nvidia" ];

  # ensure proprietary and performance settings and latest driver
  boot.kernelPackages = pkgs.linuxPackages_latest;
  hardware.nvidia = {
    modesetting.enable = true;
    powerManagement.enable = false;
    powerManagement.finegrained = false;
    open = false;
    nvidiaSettings = true;
    package = config.boot.kernelPackages.nvidiaPackages.latest;
  };
}

Hosts with AMD GPUs need ZLUDA support and can then run the test out of the box.

These configuration options have just been upstreamed

If the attribute programs.nix-required-mounts.presets.zluda.enable or hardware.amdgpu.zluda.enable don't exist in your NixOS version, please upgrade to the latest nixpkgs.

(This change was recently merged with nixpkgs PR: #501095 programs.nix-required-mounts.presets.zluda.enable: init)

configuration.nix
{
  # === sandbox settings ===
  programs.nix-required-mounts = {
    enable = true;
    presets.zluda.enable = true;
  };

  # === General GPU/AMD NixOS settings ===
  hardware.graphics.enable = true;
  hardware.amdgpu.zluda.enable = true;
}

Make sure to rebuild your NixOS configuration with these settings.

Test driver patch

Currently, the test driver does not set up the systemd-nspawn containers in a way that they would have access to the host's /run folder, although this is necessary to provide access to the GPU driver.

Before this is upstreamed in a more streamlined way, it is possible to provide that access via a small overlay file:

cuda/overlay.nix
_: prev: {
  nixos-test-driver = prev.nixos-test-driver.overrideAttrs (old: {
    patches = old.patches or [ ] ++ [ ./nixos-test-driver-gpu.patch ];
  });
}

The mentioned patch contains these Python lines:

cuda/nixos-test-driver-gpu.patch
--- a/test_driver/driver.py
+++ b/test_driver/driver.py
@@ -146,6 +146,9 @@ class Driver:
         # set up prerequisites for systemd-nspawn containers.
         # these are not guaranteed to be set up in the Nix sandbox.
         # if running interactively as root, these will already be set up.
+        if Path("/run").exists():
+            Path("/host/run").mkdir(parents=True)
+            subprocess.run(["mount", "--bind", "/run", "/host/run"], check=True)

         # check if /run is writable by root
         if not os.access("/run", os.W_OK):
--
2.53.0

Finally, the test derivation needs to communicate to the Nix daemon that it needs CUDA capabilities:

run-test.nix
cuda-test = (pkgs.testers.runNixOSTest ./cuda-test.nix).overrideTestDerivation (old: {
    requiredSystemFeatures = old.requiredSystemFeatures ++ [ "cuda" ];
});

For the runnable example in this repository, the flake.nix performs these changes like this:

  1. overlay injection
  2. definition of addRequiredFeatures helper
  3. CUDA capability requirement injection.