CDI 是什么?

Container Device Interface (CDI) 是一个提议的标准,它定义了如何在容器运行时环境中向容器提供设备。这个提议的目的是使得设备供应商能够更容易地将其设备集成到 Kubernetes 集群中,而不必修改 Kubernetes 核心代码。

CDI 插件通常负责:

  1. 配置设备以供容器使用(例如,分配设备文件或设置必要的环境变量)。
  2. 在容器启动时将设备资源注入到容器中。

官网

为什么需要CDI?

如果我们想在容器内使用 nvidia 的 gpu,在没有 CDI 之前,我们需要修改 containerd 的 low-level container runtime(runc) 到 nvidia runtime。这么做的原因就是使用 gpu 不单单要绑定 gpu device 文件到容器内,还需要绑定一些驱动文件和可执行命令(比如 nvidia-smi)等到容器内,还有就执行一些 hooks。 nvidia runtime 的作用就是绑定一些文件和执行一些 hooks 然后调用 runc。

现在我们可以使用 CDI 做这些事情,除了无需修改 runtime 外,还有抽象和插件化等优点。

版本及准备工作

  • kubelet version >= 1.28.0
  • containerd version >= 1.7.0

而且这在 k8s 1.28 (1.29版本是 beta 了 默认就打开了) 版本中是一个 alpha 版本的功能,所以我们需要在 kubelet 的启动参数中加入开启特性门:--feature-gates=DevicePluginCDIDevices =true

sudo vim /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS --feature-gates=DevicePluginCDIDevices=true

containerd 也需要开启 CDI cdi_spec_dirs 为 cdi 配置文件的目录,enable_cdi 为开启 CDI 功能。

sudo vim /etc/containerd/config.toml

cdi_spec_dirs = ["/etc/cdi", "/var/run/cdi"]
enable_cdi = true

重启 containerd 和 kubelet

sudo systemctl restart kubelet.service containerd.service

mock

因为我的集群里没有 gpu , 所以我就随便 mock 几个文件作为 device 了。

sudo mkdir /dev/mock
cd /dev/mock sudo mknod /dev/mock/device_0 c 89 1
sudo mknod /dev/mock/device_1 c 89 1
sudo mknod /dev/mock/device_2 c 89 1
sudo mknod /dev/mock/device_3 c 89 1
sudo mknod /dev/mock/device_4 c 89 1
sudo mknod /dev/mock/device_5 c 89 1
sudo mknod /dev/mock/device_6 c 89 1
sudo mknod /dev/mock/device_7 c 89 1
sudo vim /mock/bin/list_device.sh
#!/bin/bash # 定义目录数组
directories=(/dev /dev/mock) # 遍历目录数组
for dir in "${directories[@]}"; do
# 检查目录是否存在
if [ -d "$dir" ]; then
# 目录存在,打印目录下的所有文件
ls "$dir"
fi
done sudo chmod a+x /mock/bin/list_device.sh
sudo mkdir /mock/so
cd /mock/so
sudo touch device_0.so device_1.so device_2.so device_3.so device_5.so device_6.so device_7.so device_4.so

开启 kubelet 的 device plugin

下面是简单写的一个 device plugin,及其 dockerfile 还有部署到 k8s 的 yaml 文件。

package main

import (
"context"
"fmt"
"time" "github.com/kubevirt/device-plugin-manager/pkg/dpm"
pluginapi "k8s.io/kubelet/pkg/apis/deviceplugin/v1beta1"
) type PluginLister struct {
ResUpdateChan chan dpm.PluginNameList
} var ResourceNamespace = "mock.com"
var PluginName = "gpu" func (p *PluginLister) GetResourceNamespace() string {
return ResourceNamespace
} func (p *PluginLister) Discover(pluginListCh chan dpm.PluginNameList) {
pluginListCh <- dpm.PluginNameList{PluginName}
} func (p *PluginLister) NewPlugin(name string) dpm.PluginInterface {
return &Plugin{}
} type Plugin struct {
} func (p *Plugin) GetDevicePluginOptions(ctx context.Context, e *pluginapi.Empty) (*pluginapi.DevicePluginOptions, error) {
options := &pluginapi.DevicePluginOptions{
PreStartRequired: true,
}
return options, nil
} func (p *Plugin) PreStartContainer(ctx context.Context, r *pluginapi.PreStartContainerRequest) (*pluginapi.PreStartContainerResponse, error) {
return &pluginapi.PreStartContainerResponse{}, nil
} func (p *Plugin) GetPreferredAllocation(ctx context.Context, r *pluginapi.PreferredAllocationRequest) (*pluginapi.PreferredAllocationResponse, error) {
return &pluginapi.PreferredAllocationResponse{}, nil
} func (p *Plugin) ListAndWatch(e *pluginapi.Empty, r pluginapi.DevicePlugin_ListAndWatchServer) error {
devices := []*pluginapi.Device{}
for i := 0; i < 8; i++ {
devices = append(devices, &pluginapi.Device{
// 和 device 名称保持一致
ID: fmt.Sprintf("device_%d", i),
Health: pluginapi.Healthy,
})
}
for {
fmt.Println("register devices")
// 每分钟注册一次
r.Send(&pluginapi.ListAndWatchResponse{
Devices: devices,
})
time.Sleep(time.Second * 60)
}
} func (p *Plugin) Allocate(ctx context.Context, r *pluginapi.AllocateRequest) (*pluginapi.AllocateResponse, error) {
// 使用cdi插件
responses := &pluginapi.AllocateResponse{}
for _, req := range r.ContainerRequests {
cdidevices := []*pluginapi.CDIDevice{}
for _, id := range req.DevicesIDs {
cdidevices = append(cdidevices, &pluginapi.CDIDevice{
Name: fmt.Sprintf("%s/%s=%s", ResourceNamespace, PluginName, id),
})
}
responses.ContainerResponses = append(responses.ContainerResponses, &pluginapi.ContainerAllocateResponse{
CDIDevices: cdidevices,
})
}
return responses, nil
} func main() {
m := dpm.NewManager(&PluginLister{})
m.Run()
}
FROM golang:1.21.3 as builder

COPY . /src
WORKDIR /src
RUN go env -w GO111MODULE=on && go env -w GOPROXY=https://goproxy.io,direct
RUN go build FROM debian:bookworm-slim RUN sed -i 's/deb.debian.org/mirrors.ustc.edu.cn/g' /etc/apt/sources.list.d/debian.sources RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
netbase \
pciutils \
curl \
&& rm -rf /var/lib/apt/lists/ \
&& apt-get autoremove -y && apt-get autoclean -y RUN update-pciids COPY --from=builder /src /app WORKDIR /app
apiVersion: v1
kind: Namespace
metadata:
name: mock-plugin
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: mock-plugin-daemonset
namespace: mock-plugin
spec:
selector:
matchLabels:
name: mock-plugin
template:
metadata:
labels:
name: mock-plugin
app.kubernetes.io/component: mock-plugin
app.kubernetes.io/name: mock-plugin
spec:
containers:
- image: zhaohaiyu/mock:v1
name: mock-plugin
command: ['/app/mock']
imagePullPolicy: Always
securityContext:
privileged: true
tty: true
volumeMounts:
- name: kubelet
mountPath: /var/lib/kubelet
volumes:
- name: kubelet
hostPath:
path: /var/lib/kubelet

执行完毕使用 kubectl 查看

❯ kubectl -n mock-plugin get pod
NAME READY STATUS RESTARTS AGE
mock-plugin-daemonset-8w2r8 1/1 Running 0 3m27s

查看 node 是否已经注册了 device plugin

kubectl describe node node1

Capacity:
cpu: 8
ephemeral-storage: 102626232Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 24570324Ki
mock.com/gpu: 8
pods: 110
Allocatable:
cpu: 8
ephemeral-storage: 94580335255
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 24467924Ki
mock.com/gpu: 8
pods: 110

可以看到已经注册了 8 个 gpu 设备 名字叫 mock.com/gpu 也就是我们代码中定义的。

CDI配置文件

CDI Spec: https://github.com/cncf-tags/container-device-interface/blob/main/SPEC.md

我们也生成了一个 CDI 的配置文件,这个配置文件会被 containerd 读取,然后根据配置文件中的信息去调用 device plugin。

# vim /etc/cdi/mock.yaml
cdiVersion: 0.5.0
kind: mock.com/gpu
devices: - name: device_0
containerEdits:
deviceNodes:
- hostPath: "/dev/mock/device_0"
path: "/dev/mock/device_0"
type: c
permissions: rw
mounts:
- hostPath: "/mock/so/device_0.so"
containerPath: "/mock/so/device_0.so"
options:
- ro
- nosuid
- nodev
- bind - name: device_1
containerEdits:
deviceNodes:
- hostPath: "/dev/mock/device_1"
path: "/dev/mock/device_1"
type: c
permissions: rw
mounts:
- hostPath: "/mock/so/device_1.so"
containerPath: "/mock/so/device_1.so"
options:
- ro
- nosuid
- nodev
- bind - name: device_2
containerEdits:
deviceNodes:
- hostPath: "/dev/mock/device_2"
path: "/dev/mock/device_2"
type: c
permissions: rw
mounts:
- hostPath: "/mock/so/device_2.so"
containerPath: "/mock/so/device_2.so"
options:
- ro
- nosuid
- nodev
- bind - name: device_3
containerEdits:
deviceNodes:
- hostPath: "/dev/mock/device_3"
path: "/dev/mock/device_3"
type: c
permissions: rw
mounts:
- hostPath: "/mock/so/device_3.so"
containerPath: "/mock/so/device_3.so"
options:
- ro
- nosuid
- nodev
- bind - name: device_4
containerEdits:
deviceNodes:
- hostPath: "/dev/mock/device_4"
path: "/dev/mock/device_4"
type: c
permissions: rw
mounts:
- hostPath: "/mock/so/device_4.so"
containerPath: "/mock/so/device_4.so"
options:
- ro
- nosuid
- nodev
- bind - name: device_5
containerEdits:
deviceNodes:
- hostPath: "/dev/mock/device_5"
path: "/dev/mock/device_5"
type: c
permissions: rw
mounts:
- hostPath: "/mock/so/device_5.so"
containerPath: "/mock/so/device_5.so"
options:
- ro
- nosuid
- nodev
- bind - name: device_6
containerEdits:
deviceNodes:
- hostPath: "/dev/mock/device_6"
path: "/dev/mock/device_6"
type: c
permissions: rw
mounts:
- hostPath: "/mock/so/device_6.so"
containerPath: "/mock/so/device_6.so"
options:
- ro
- nosuid
- nodev
- bind - name: device_7
containerEdits:
deviceNodes:
- hostPath: "/dev/mock/device_7"
path: "/dev/mock/device_7"
type: c
permissions: rw
mounts:
- hostPath: "/mock/so/device_7.so"
containerPath: "/mock/so/device_7.so"
options:
- ro
- nosuid
- nodev
- bind containerEdits:
mounts:
- hostPath: "/mock/bin/list_device.sh"
containerPath: "/usr/local/bin/list_device.sh"
options:
- ro
- nosuid
- nodev
- bind

这里我只是简单示例,还有 hooks 和 env 等用法查看官方文档。

部署 pod

我部署一个 pod 使用 mock gpu 这个资源 4个。

apiVersion: v1
kind: Pod
metadata:
name: ubuntu1
spec:
containers:
- name: ubuntu-container
image: ubuntu:latest
command: ["sleep"]
args: ["3600"]
resources:
requests:
memory: "64Mi"
cpu: "250m"
mock.com/gpu: "4"
limits:
memory: "128Mi"
cpu: "500m"
mock.com/gpu: "4"
ubuntu1   1/1     Running   0          49s

现在我们 使用 kubectl exec -it ubuntu1 bash 进入容器看一看。

ls /dev/mock/
device_0 device_1 device_6 device_7 ls /mock/so/
device_0.so device_1.so device_6.so device_7.so list_device.sh
so
device_0 device_1 device_6 device_7

可以看到我们 cdi 配置文件配置的 device 和 so 文件和还有我们的 list_device.sh 都已经挂载到容器内了。

我现在再启动一个 pod 使用 mock gpu 这个资源 3 个。

apiVersion: v1
kind: Pod
metadata:
name: ubuntu2
spec:
containers:
- name: ubuntu-container
image: ubuntu:latest
command: ["sleep"]
args: ["3600"]
resources:
requests:
memory: "64Mi"
cpu: "250m"
mock.com/gpu: "3"
limits:
memory: "128Mi"
cpu: "500m"
mock.com/gpu: "3"
ls /dev/mock/
device_2 device_3 device_5

查看node使用了多少资源

Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1550m (19%) 1 (12%)
memory 668Mi (2%) 596Mi (2%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
mock.com/gpu 7 7

可以看到我们使用了 7 个 mock.com/gpu 资源,还剩下 1 个。

nvdia gpu

我找了一台带有 nvidia gpu 的机器,然后安装了 nvidia-container-toolkit-base。

使用 nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml 生产 cdi 配置文件。

---
cdiVersion: 0.5.0
containerEdits:
deviceNodes:
- path: /dev/nvidia-modeset
- path: /dev/nvidia-uvm
- path: /dev/nvidia-uvm-tools
- path: /dev/nvidiactl
hooks:
- args:
- nvidia-ctk
- hook
- create-symlinks
- --link
- libglxserver_nvidia.so.525.147.05::/usr/lib/x86_64-linux-gnu/nvidia/xorg/libglxserver_nvidia.so
hookName: createContainer
path: /usr/bin/nvidia-ctk
- args:
- nvidia-ctk
- hook
- update-ldcache
- --folder
- /usr/lib/x86_64-linux-gnu
hookName: createContainer
path: /usr/bin/nvidia-ctk
mounts:
- containerPath: /run/nvidia-persistenced/socket
hostPath: /run/nvidia-persistenced/socket
options:
- ro
- nosuid
- nodev
- bind
- noexec
- containerPath: /usr/bin/nvidia-cuda-mps-control
hostPath: /usr/bin/nvidia-cuda-mps-control
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/bin/nvidia-cuda-mps-server
hostPath: /usr/bin/nvidia-cuda-mps-server
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/bin/nvidia-debugdump
hostPath: /usr/bin/nvidia-debugdump
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/bin/nvidia-persistenced
hostPath: /usr/bin/nvidia-persistenced
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/bin/nvidia-smi
hostPath: /usr/bin/nvidia-smi
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.525.147.05
hostPath: /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.525.147.05
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.525.147.05
hostPath: /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.525.147.05
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.525.147.05
hostPath: /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.525.147.05
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.525.147.05
hostPath: /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.525.147.05
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libcuda.so.525.147.05
hostPath: /usr/lib/x86_64-linux-gnu/libcuda.so.525.147.05
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libcudadebugger.so.525.147.05
hostPath: /usr/lib/x86_64-linux-gnu/libcudadebugger.so.525.147.05
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvcuvid.so.525.147.05
hostPath: /usr/lib/x86_64-linux-gnu/libnvcuvid.so.525.147.05
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.525.147.05
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.525.147.05
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.525.147.05
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.525.147.05
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.525.147.05
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.525.147.05
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-egl-gbm.so.1.1.0
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-egl-gbm.so.1.1.0
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.525.147.05
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.525.147.05
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.525.147.05
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.525.147.05
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.525.147.05
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.525.147.05
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.525.147.05
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.525.147.05
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.525.147.05
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.525.147.05
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.525.147.05
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.525.147.05
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.525.147.05
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.525.147.05
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.525.147.05
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.525.147.05
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.525.147.05
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.525.147.05
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.525.147.05
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.525.147.05
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.525.147.05
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.525.147.05
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.525.147.05
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.525.147.05
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.525.147.05
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.525.147.05
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.525.147.05
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.525.147.05
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvoptix.so.525.147.05
hostPath: /usr/lib/x86_64-linux-gnu/libnvoptix.so.525.147.05
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /lib/firmware/nvidia/525.147.05/gsp_ad10x.bin
hostPath: /lib/firmware/nvidia/525.147.05/gsp_ad10x.bin
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /lib/firmware/nvidia/525.147.05/gsp_tu10x.bin
hostPath: /lib/firmware/nvidia/525.147.05/gsp_tu10x.bin
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/share/X11/xorg.conf.d/10-nvidia.conf
hostPath: /usr/share/X11/xorg.conf.d/10-nvidia.conf
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/share/egl/egl_external_platform.d/15_nvidia_gbm.json
hostPath: /usr/share/egl/egl_external_platform.d/15_nvidia_gbm.json
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/share/glvnd/egl_vendor.d/10_nvidia.json
hostPath: /usr/share/glvnd/egl_vendor.d/10_nvidia.json
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/share/vulkan/icd.d/nvidia_icd.json
hostPath: /usr/share/vulkan/icd.d/nvidia_icd.json
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/nvidia/xorg/libglxserver_nvidia.so.525.147.05
hostPath: /usr/lib/x86_64-linux-gnu/nvidia/xorg/libglxserver_nvidia.so.525.147.05
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so
hostPath: /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so
options:
- ro
- nosuid
- nodev
- bind
devices:
- containerEdits:
deviceNodes:
- path: /dev/nvidia0
- path: /dev/dri/card0
- path: /dev/dri/renderD128
hooks:
- args:
- nvidia-ctk
- hook
- create-symlinks
- --link
- ../card0::/dev/dri/by-path/pci-0000:01:00.0-card
- --link
- ../renderD128::/dev/dri/by-path/pci-0000:01:00.0-render
hookName: createContainer
path: /usr/bin/nvidia-ctk
- args:
- nvidia-ctk
- hook
- chmod
- --mode
- "755"
- --path
- /dev/dri
hookName: createContainer
path: /usr/bin/nvidia-ctk
name: "0"
- containerEdits:
deviceNodes:
- path: /dev/nvidia0
- path: /dev/dri/card0
- path: /dev/dri/renderD128
hooks:
- args:
- nvidia-ctk
- hook
- create-symlinks
- --link
- ../card0::/dev/dri/by-path/pci-0000:01:00.0-card
- --link
- ../renderD128::/dev/dri/by-path/pci-0000:01:00.0-render
hookName: createContainer
path: /usr/bin/nvidia-ctk
- args:
- nvidia-ctk
- hook
- chmod
- --mode
- "755"
- --path
- /dev/dri
hookName: createContainer
path: /usr/bin/nvidia-ctk
name: all
kind: nvidia.com/gpu

可以看到 nvidia 的 cdi 配置文件比 mock 的要复杂很多,因为 nvidia 的 gpu 需要绑定很多文件到容器内,还有 hooks 等。这些工作之前都是在 runtime 中做的,现在都可以通过 cdi 插件来做了。

kubernetes container device interface (CDI)的更多相关文章

  1. docker的网络-Container network interface(CNI)与Container network model(CNM)

    Overview 目前围绕着docker的网络,目前有两种比较主流的声音,docker主导的Container network model(CNM)和社区主导的Container network in ...

  2. Kubernetes Container lifecycle hooks

    简介 在kubernetes中,容器hooks旨在解决服务进程启动与停止时的一些优雅操作需求.比如说进程停止时要给进程一个清理数据的时间,保证服务的请求正常结束,而不是强行中断服务的运行,这样在生产环 ...

  3. GDI+(Graphics Device Interface)例子

    使用SolidBrush 单色画笔 Bitmap bitmap = new Bitmap(800, 600);            Graphics graphics = Graphics.From ...

  4. GDI+(Graphics Device Interface)

    1创建画布(创建Graphics对象) Bitmap bitmap = new Bitmap(80,80); Graphics g=Graphics.FromImage(bitmap); 2创建Pen ...

  5. containerd与kubernetes集成

    kubernetes集群三步安装 概念介绍 cri (Container runtime interface) cri is a containerd plugin implementation of ...

  6. containerd与kubernetes集成部署

    概念介绍 cri (Container runtime interface) cri is a containerd plugin implementation of Kubernetes conta ...

  7. 云原生入门 第五章:kubernetes学习实践

    1. 简介 在本章中,我们将学习不同的Kubernetes对象,它们的用途以及如何与它们交互. 在设置集群或使用现有集群之后,我们可以开始部署一些工作负载.Kubernetes中最小的计算单元不是一个 ...

  8. 【云原生 · Kubernetes】runtime组件

    个人名片: 因为云计算成为了监控工程师‍ 个人博客:念舒_C.ying CSDN主页️:念舒_C.ying runtime组件 8.1 部署cri-o组件 8.2 下载二进制文件 8.3 修改配置文件 ...

  9. Docker孵化的5个开源项目

    版权声明:本文为博主原创文章.未经博主同意不得转载. https://blog.csdn.net/M2l0ZgSsVc7r69eFdTj/article/details/81977243 回想过去短短 ...

  10. Mirantis 收购 Docker | 云原生生态周报 Vol. 28

    作者 | 禅鸣.进超.心水.心贵 业界要闻 Docker 将 Docker Enterprise 卖给 Mirantis Mirantis 是一家扎根于 OpenStack 的云公司,最近专注于 Ku ...

随机推荐

  1. bzip2: (stdin) is not a bzip2 file.

    用tar -zxvf dir.tar.gz命令解压即可.

  2. vlunhub笔记(二)earth

    (一)信息收集 开始扫描目标机ip,目标机ip:192.168.241.135 arp-scan -l 直接访问目标  ip 192.168.241.135   发现400报错 只能先去考虑扫一下信息 ...

  3. Pytorch 最全入门介绍,Pytorch入门看这一篇就够了

    本文通过详细且实践性的方式介绍了 PyTorch 的使用,包括环境安装.基础知识.张量操作.自动求导机制.神经网络创建.数据处理.模型训练.测试以及模型的保存和加载. 1. Pytorch简介 在这一 ...

  4. 自治系统/自治域和自治系统编号(ASN)

    定义: 自治系统或自治域(英文:Autonomous system, AS)是指在互联网中,一个或多个实体管辖下的所有IP网络和路由器的组合,它们对互联网执行共同的路由策略.参看RFC 1930中更新 ...

  5. 耗时6个月,我做了一款干净、免费、开源的AI数据库

    一.Chat2DB简介 在消失的这段时间,我和小伙伴们做了一款集成了AI的数据库管理工具Chat2DB. 他是数据库也集成了AIGC的能力,能够将自然语言转换为SQL,也可以将SQL转换为自然语言,还 ...

  6. 高性能MySQL实战(一):表结构

    最近因需求改动新增了一些数据库表,但是在定义表结构时,具体列属性的选择有些不知其所以然,索引的添加也有遗漏和不规范的地方,所以我打算为创建一个高性能表的过程以实战的形式写一个专题,以此来学习和巩固这些 ...

  7. 给微软.Net runtime运行时提交的几个Issues

    前言 因为目前从事的CLR+JIT,所以会遇到一些非常底层的问题,比如涉及到微软的公共运行时和即时编译器或者AOT编译器的编译异常等情况,这里分享下自己提的几个Issues. Issues 一.iss ...

  8. Prompt 指北:如何写好 Prompt,让 GPT 的回答更加精准

    目录 1. 得亏 GPT 脾气好 2. 玩 GPT 得注意姿势 3. 指南指北指东指西 3.1 首先你得理解 GPT 是咋工作的 3.2 "Prompt 工程"走起 3.3 奇淫技 ...

  9. MySQL 分表查询

    分表是一种数据库分割技术,用于将大表拆分成多个小表,以提高数据库的性能和可管理性.在MySQL中,可以使用多种方法进行分表,例如基于范围.哈希或列表等.下面将详细介绍MySQL如何分表以及分表后如何进 ...

  10. macbook-键盘连击问题002

    https://support.apple.com/zh-cn/HT205662 如何清洁 MacBook 或 MacBook Pro 的键盘 如果您的 MacBook(2015 年及更新机型)或 M ...