mirror of https://github.com/k3d-io/k3d
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
117 lines
4.5 KiB
117 lines
4.5 KiB
4 years ago
|
# Running CUDA workloads
|
||
4 years ago
|
|
||
3 years ago
|
If you want to run CUDA workloads on the K3s container you need to customize the container.
|
||
3 years ago
|
CUDA workloads require the NVIDIA Container Runtime, so containerd needs to be configured to use this runtime.
|
||
3 years ago
|
The K3s container itself also needs to run with this runtime.
|
||
3 years ago
|
If you are using Docker you can install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html).
|
||
4 years ago
|
|
||
3 years ago
|
## Building a customized K3s image
|
||
4 years ago
|
|
||
3 years ago
|
To get the NVIDIA container runtime in the K3s image you need to build your own K3s image.
|
||
|
The native K3s image is based on Alpine but the NVIDIA container runtime is not supported on Alpine yet.
|
||
3 years ago
|
To get around this we need to build the image with a supported base image.
|
||
4 years ago
|
|
||
3 years ago
|
### Dockerfile
|
||
3 years ago
|
|
||
3 years ago
|
[Dockerfile](cuda/Dockerfile):
|
||
4 years ago
|
|
||
|
```Dockerfile
|
||
3 years ago
|
{% include "cuda/Dockerfile" %}
|
||
4 years ago
|
```
|
||
4 years ago
|
|
||
3 years ago
|
This Dockerfile is based on the [K3s Dockerfile](https://github.com/rancher/k3s/blob/master/package/Dockerfile)
|
||
4 years ago
|
The following changes are applied:
|
||
4 years ago
|
|
||
3 years ago
|
1. Change the base images to nvidia/cuda:11.2.0-base-ubuntu18.04 so the NVIDIA Container Runtime can be installed. The version of `cuda:xx.x.x` must match the one you're planning to use.
|
||
4 years ago
|
2. Add a custom containerd `config.toml` template to add the NVIDIA Container Runtime. This replaces the default `runc` runtime
|
||
|
3. Add a manifest for the NVIDIA driver plugin for Kubernetes
|
||
|
|
||
|
### Configure containerd
|
||
4 years ago
|
|
||
3 years ago
|
We need to configure containerd to use the NVIDIA Container Runtime. We need to customize the config.toml that is used at startup. K3s provides a way to do this using a [config.toml.tmpl](cuda/config.toml.tmpl) file. More information can be found on the [K3s site](https://rancher.com/docs/k3s/latest/en/advanced/#configuring-containerd).
|
||
4 years ago
|
|
||
|
```go
|
||
3 years ago
|
{% include "cuda/config.toml.tmpl" %}
|
||
4 years ago
|
```
|
||
|
|
||
|
### The NVIDIA device plugin
|
||
4 years ago
|
|
||
4 years ago
|
To enable NVIDIA GPU support on Kubernetes you also need to install the [NVIDIA device plugin](https://github.com/NVIDIA/k8s-device-plugin). The device plugin is a deamonset and allows you to automatically:
|
||
4 years ago
|
|
||
4 years ago
|
* Expose the number of GPUs on each nodes of your cluster
|
||
|
* Keep track of the health of your GPUs
|
||
|
* Run GPU enabled containers in your Kubernetes cluster.
|
||
|
|
||
|
```yaml
|
||
3 years ago
|
{% include "cuda/device-plugin-daemonset.yaml" %}
|
||
4 years ago
|
```
|
||
|
|
||
3 years ago
|
### Build the K3s image
|
||
4 years ago
|
|
||
3 years ago
|
To build the custom image we need to build K3s because we need the generated output.
|
||
4 years ago
|
|
||
|
Put the following files in a directory:
|
||
3 years ago
|
|
||
3 years ago
|
* [Dockerfile](cuda/Dockerfile)
|
||
4 years ago
|
* [config.toml.tmpl](cuda/config.toml.tmpl)
|
||
3 years ago
|
* [device-plugin-daemonset.yaml](cuda/device-plugin-daemonset.yaml)
|
||
3 years ago
|
* [build.sh](cuda/build.sh)
|
||
|
* [cuda-vector-add.yaml](cuda/cuda-vector-add.yaml)
|
||
4 years ago
|
|
||
3 years ago
|
The `build.sh` script is configured using exports & defaults to `v1.21.2+k3s1`. Please set at least the `IMAGE_REGISTRY` variable! The script performs the following steps builds the custom K3s image including the nvidia drivers.
|
||
4 years ago
|
|
||
3 years ago
|
[build.sh](cuda/build.sh):
|
||
4 years ago
|
|
||
4 years ago
|
```bash
|
||
3 years ago
|
{% include "cuda/build.sh" %}
|
||
4 years ago
|
```
|
||
|
|
||
3 years ago
|
## Run and test the custom image with k3d
|
||
4 years ago
|
|
||
3 years ago
|
You can use the image with k3d:
|
||
4 years ago
|
|
||
|
```bash
|
||
3 years ago
|
k3d cluster create gputest --image=$IMAGE --gpus=1
|
||
4 years ago
|
```
|
||
4 years ago
|
|
||
3 years ago
|
Deploy a [test pod](cuda/cuda-vector-add.yaml):
|
||
4 years ago
|
|
||
|
```bash
|
||
3 years ago
|
kubectl apply -f cuda-vector-add.yaml
|
||
|
kubectl logs cuda-vector-add
|
||
4 years ago
|
```
|
||
|
|
||
3 years ago
|
This should output something like the following:
|
||
4 years ago
|
|
||
|
```bash
|
||
3 years ago
|
$ kubectl logs cuda-vector-add
|
||
|
|
||
|
[Vector addition of 50000 elements]
|
||
|
Copy input data from the host memory to the CUDA device
|
||
|
CUDA kernel launch with 196 blocks of 256 threads
|
||
|
Copy output data from the CUDA device to the host memory
|
||
|
Test PASSED
|
||
|
Done
|
||
4 years ago
|
```
|
||
4 years ago
|
|
||
3 years ago
|
If the `cuda-vector-add` pod is stuck in `Pending` state, probably the device-driver daemonset didn't get deployed correctly from the auto-deploy manifests. In that case, you can apply it manually via `#!bash kubectl apply -f device-plugin-daemonset.yaml`.
|
||
4 years ago
|
|
||
|
## Known issues
|
||
4 years ago
|
|
||
4 years ago
|
* This approach does not work on WSL2 yet. The NVIDIA driver plugin and container runtime rely on the NVIDIA Management Library (NVML) which is not yet supported. See the [CUDA on WSL User Guide](https://docs.nvidia.com/cuda/wsl-user-guide/index.html#known-limitations).
|
||
|
|
||
4 years ago
|
## Acknowledgements
|
||
|
|
||
4 years ago
|
Most of the information in this article was obtained from various sources:
|
||
4 years ago
|
|
||
4 years ago
|
* [Add NVIDIA GPU support to k3s with containerd](https://dev.to/mweibel/add-nvidia-gpu-support-to-k3s-with-containerd-4j17)
|
||
|
* [microk8s](https://github.com/ubuntu/microk8s)
|
||
3 years ago
|
* [K3s](https://github.com/rancher/k3s)
|
||
3 years ago
|
* [k3s-gpu](https://gitlab.com/vainkop1/k3s-gpu)
|
||
|
|
||
|
## Authors
|
||
|
|
||
3 years ago
|
* [@markrexwinkel](https://github.com/markrexwinkel)
|
||
|
* [@vainkop](https://github.com/vainkop)
|
||
|
* [@iwilltry42](https://github.com/iwilltry42)
|