Kubernetes集群部署Node Feature Discovery组件用于检测集群节点特性
1、概述
Node Feature Discovery(NFD)是由Intel创建的项目,能够帮助Kubernetes集群更智能地管理节点资源。它通过检测每个节点的特性能力(例如CPU型号、GPU型号、内存大小等)并将这些能力以标签的形式发送到Kubernetes集群的API服务器(kube-apiserver)。然后,通过kube-apiserver修改节点的标签。这些标签可以帮助调度器(kube-scheduler)更智能地选择最适合特定工作负载的节点来运行Pod。
Github:https://github.com/kubernetes-sigs/node-feature-discovery
Docs:https://kubernetes-sigs.github.io/node-feature-discovery/master/get-started/index.html
2、组件架构
NFD 细分为 NFD-Master 和 NFD-Worker 两个组件:
NFD-Master:是一个负责与 kubernetes API Server 通信的Deployment Pod,它从 NFD-Worker 接收节点特性并相应地修改 Node 资源对象(标签、注解)。
NFD-Worker:是一个负责对 Node 的特性能力进行检测的 Daemon Pod,然后它将信息传递给 NFD-Master,NFD-Worker 应该在每个 Node 上运行。
可以检测发现的硬件特征源(feature sources)清单包括:
- CPU
- IOMMU
- Kernel
- Memory
- Network
- PCI
- Storage
- System
- USB
- Custom (rule-based custom features)
- Local (hooks for user-specific features)
3、组件安装
(1)安装前查看集群节点状态
[root@master-10 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master-10.20.31.105 Ready control-plane,master,worker 31h v1.21.5
节点详细信息,主要关注标签、注解。
[root@master-10 ~]# kubectl describe nodes master-10.20.31.105
Name: master-10.20.31.105
Roles: control-plane,master,worker
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=master-10.20.31.105
kubernetes.io/os=linux
node-role.kubernetes.io/control-plane=
node-role.kubernetes.io/master=
node-role.kubernetes.io/worker=
node.kubernetes.io/exclude-from-external-load-balancers=
Annotations: flannel.alpha.coreos.com/backend-data: {"VtepMAC":"c6:fb:4b:8a:bb:12"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 10.20.31.105
kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Tue, 12 Mar 2024 21:01:31 -0400
Taints: <none>
........
(2)组件安装
[root@master-10 opt]# kubectl apply -k https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/default?ref=v0.14.2
namespace/node-feature-discovery created
customresourcedefinition.apiextensions.k8s.io/nodefeaturerules.nfd.k8s-sigs.io created
customresourcedefinition.apiextensions.k8s.io/nodefeatures.nfd.k8s-sigs.io created
serviceaccount/nfd-master created
serviceaccount/nfd-worker created
role.rbac.authorization.k8s.io/nfd-worker created
clusterrole.rbac.authorization.k8s.io/nfd-master created
rolebinding.rbac.authorization.k8s.io/nfd-worker created
clusterrolebinding.rbac.authorization.k8s.io/nfd-master created
configmap/nfd-master-conf created
configmap/nfd-worker-conf created
service/nfd-master created
deployment.apps/nfd-master created
daemonset.apps/nfd-worker created
(3)查看组件状态
[root@master-10 opt]# kubectl get pods -n=node-feature-discovery
NAME READY STATUS RESTARTS AGE
nfd-master-5c4684f5cb-hvjjb 1/1 Running 0 4m11s
nfd-worker-cpwx6 1/1 Running 0 4m11s
(4)查看组件日志
可以看到nfd-worker组件默认每隔一分钟检测一次节点特性。
[root@master-10 ~]# kubectl logs -f -n=node-feature-discovery nfd-worker-rlf5t
I0314 06:30:32.003264 1 main.go:66] "-server is deprecated, will be removed in a future release along with the deprecated gRPC API"
I0314 06:30:32.003372 1 nfd-worker.go:219] "Node Feature Discovery Worker" version="v0.14.2" nodeName="master-10.20.31.105" namespace="node-feature-discovery"
I0314 06:30:32.003589 1 nfd-worker.go:520] "configuration file parsed" path="/etc/kubernetes/node-feature-discovery/nfd-worker.conf"
I0314 06:30:32.004500 1 nfd-worker.go:552] "configuration successfully updated" configuration={"Core":{"Klog":{},"LabelWhiteList":{},"NoPublish":false,"FeatureSources":["all"],"Sources":null,"LabelSources":["all"],"SleepInterval":{"Duration":60000000000}},"Sources":{"cpu":{"cpuid":{"attributeBlacklist":["BMI1","BMI2","CLMUL","CMOV","CX16","ERMS","F16C","HTT","LZCNT","MMX","MMXEXT","NX","POPCNT","RDRAND","RDSEED","RDTSCP","SGX","SGXLC","SSE","SSE2","SSE3","SSE4","SSE42","SSSE3","TDX_GUEST"]}},"custom":[],"fake":{"labels":{"fakefeature1":"true","fakefeature2":"true","fakefeature3":"true"},"flagFeatures":["flag_1","flag_2","flag_3"],"attributeFeatures":{"attr_1":"true","attr_2":"false","attr_3":"10"},"instanceFeatures":[{"attr_1":"true","attr_2":"false","attr_3":"10","attr_4":"foobar","name":"instance_1"},{"attr_1":"true","attr_2":"true","attr_3":"100","name":"instance_2"},{"name":"instance_3"}]},"kernel":{"KconfigFile":"","configOpts":["NO_HZ","NO_HZ_IDLE","NO_HZ_FULL","PREEMPT"]},"local":{},"pci":{"deviceClassWhitelist":["03","0b40","12"],"deviceLabelFields":["class","vendor"]},"usb":{"deviceClassWhitelist":["0e","ef","fe","ff"],"deviceLabelFields":["class","vendor","device"]}}}
I0314 06:30:32.004796 1 metrics.go:70] "metrics server starting" port=8081
I0314 06:30:32.019135 1 nfd-worker.go:562] "starting feature discovery..."
I0314 06:30:32.019364 1 nfd-worker.go:577] "feature discovery completed"
I0314 06:31:32.021520 1 nfd-worker.go:562] "starting feature discovery..."
I0314 06:31:32.021695 1 nfd-worker.go:577] "feature discovery completed"
I0314 06:32:32.027970 1 nfd-worker.go:562] "starting feature discovery..."
I0314 06:32:32.028141 1 nfd-worker.go:577] "feature discovery completed"
可以看到nfd-master组件启动后默认第一分钟相应地修改 Node 资源对象(标签、注解),之后是每隔一个小时修改一次 Node 资源对象(标签、注解),也就是说如果一个小时以内用户手动误修改node资源特性信息(标签、注解),最多需要一个小时nfd-master组件才自动更正node资源特性信息。
[root@master-10 ~]# kubectl logs -n=node-feature-discovery nfd-master-5c4684f5cb-hvjjb
I0314 06:23:08.190218 1 nfd-master.go:213] "Node Feature Discovery Master" version="v0.14.2" nodeName="master-10.20.31.105" namespace="node-feature-discovery"
I0314 06:23:08.190356 1 nfd-master.go:1214] "configuration file parsed" path="/etc/kubernetes/node-feature-discovery/nfd-master.conf"
I0314 06:23:08.190912 1 nfd-master.go:1274] "configuration successfully updated" configuration=<
DenyLabelNs: {}
EnableTaints: false
ExtraLabelNs: {}
Klog: {}
LabelWhiteList: {}
LeaderElection:
LeaseDuration:
Duration: 15000000000
RenewDeadline:
Duration: 10000000000
RetryPeriod:
Duration: 2000000000
NfdApiParallelism: 10
NoPublish: false
ResourceLabels: {}
ResyncPeriod:
Duration: 3600000000000
>
I0314 06:23:08.190928 1 nfd-master.go:1338] "starting the nfd api controller"
I0314 06:23:08.191105 1 node-updater-pool.go:79] "starting the NFD master node updater pool" parallelism=10
I0314 06:23:08.860810 1 metrics.go:115] "metrics server starting" port=8081
I0314 06:23:08.861033 1 component.go:36] [core][Server #1] Server created
I0314 06:23:08.861050 1 nfd-master.go:347] "gRPC server serving" port=8080
I0314 06:23:08.861084 1 component.go:36] [core][Server #1 ListenSocket #2] ListenSocket created
I0314 06:23:09.860886 1 nfd-master.go:694] "will process all nodes in the cluster"
I0314 06:23:09.923362 1 nfd-master.go:1086] "node updated" nodeName="master-10.20.31.105"
I0314 07:23:09.224254 1 nfd-master.go:1086] "node updated" nodeName="master-10.20.31.105"
I0314 08:23:09.081362 1 nfd-master.go:1086] "node updated" nodeName="master-10.20.31.105"
(5)查看节点特性信息
可以看到NFD组件已经把节点特性信息维护到了节点标签、注解上,其中标签前缀默认为 feature.node.kubernetes.io/。
[root@master-10 opt]# kubectl describe node master-10.20.31.105
Name: master-10.20.31.105
Roles: control-plane,master,worker
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
feature.node.kubernetes.io/cpu-cpuid.ADX=true
feature.node.kubernetes.io/cpu-cpuid.AESNI=true
feature.node.kubernetes.io/cpu-cpuid.AVX=true
feature.node.kubernetes.io/cpu-cpuid.AVX2=true
feature.node.kubernetes.io/cpu-cpuid.AVX512BW=true
feature.node.kubernetes.io/cpu-cpuid.AVX512CD=true
feature.node.kubernetes.io/cpu-cpuid.AVX512DQ=true
feature.node.kubernetes.io/cpu-cpuid.AVX512F=true
feature.node.kubernetes.io/cpu-cpuid.AVX512VL=true
feature.node.kubernetes.io/cpu-cpuid.CMPXCHG8=true
feature.node.kubernetes.io/cpu-cpuid.FMA3=true
feature.node.kubernetes.io/cpu-cpuid.FXSR=true
feature.node.kubernetes.io/cpu-cpuid.FXSROPT=true
feature.node.kubernetes.io/cpu-cpuid.HLE=true
feature.node.kubernetes.io/cpu-cpuid.HYPERVISOR=true
feature.node.kubernetes.io/cpu-cpuid.LAHF=true
feature.node.kubernetes.io/cpu-cpuid.MOVBE=true
feature.node.kubernetes.io/cpu-cpuid.MPX=true
feature.node.kubernetes.io/cpu-cpuid.OSXSAVE=true
feature.node.kubernetes.io/cpu-cpuid.RTM=true
feature.node.kubernetes.io/cpu-cpuid.SYSCALL=true
feature.node.kubernetes.io/cpu-cpuid.SYSEE=true
feature.node.kubernetes.io/cpu-cpuid.X87=true
feature.node.kubernetes.io/cpu-cpuid.XSAVE=true
feature.node.kubernetes.io/cpu-cpuid.XSAVEC=true
feature.node.kubernetes.io/cpu-cpuid.XSAVEOPT=true
feature.node.kubernetes.io/cpu-cpuid.XSAVES=true
feature.node.kubernetes.io/cpu-hardware_multithreading=false
feature.node.kubernetes.io/cpu-model.family=6
feature.node.kubernetes.io/cpu-model.id=85
feature.node.kubernetes.io/cpu-model.vendor_id=Intel
feature.node.kubernetes.io/kernel-config.NO_HZ=true
feature.node.kubernetes.io/kernel-config.NO_HZ_FULL=true
feature.node.kubernetes.io/kernel-version.full=3.10.0-1160.105.1.el7.x86_64
feature.node.kubernetes.io/kernel-version.major=3
feature.node.kubernetes.io/kernel-version.minor=10
feature.node.kubernetes.io/kernel-version.revision=0
feature.node.kubernetes.io/pci-0300_15ad.present=true
feature.node.kubernetes.io/system-os_release.ID=centos
feature.node.kubernetes.io/system-os_release.VERSION_ID=7
feature.node.kubernetes.io/system-os_release.VERSION_ID.major=7
kubernetes.io/arch=amd64
kubernetes.io/hostname=master-10.20.31.105
kubernetes.io/os=linux
node-role.kubernetes.io/control-plane=
node-role.kubernetes.io/master=
node-role.kubernetes.io/worker=
node.kubernetes.io/exclude-from-external-load-balancers=
Annotations: flannel.alpha.coreos.com/backend-data: {"VtepMAC":"c6:fb:4b:8a:bb:12"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 10.20.31.105
kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
nfd.node.kubernetes.io/feature-labels:
cpu-cpuid.ADX,cpu-cpuid.AESNI,cpu-cpuid.AVX,cpu-cpuid.AVX2,cpu-cpuid.AVX512BW,cpu-cpuid.AVX512CD,cpu-cpuid.AVX512DQ,cpu-cpuid.AVX512F,cpu-...
nfd.node.kubernetes.io/master.version: v0.14.2
nfd.node.kubernetes.io/worker.version: v0.14.2
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Tue, 12 Mar 2024 21:01:31 -0400
4、组件应用场景
Node Feature Discovery(NFD)组件的主要应用场景是在Kubernetes集群中提供更智能的节点调度。以下是一些NFD的常见应用场景:
智能节点调度:NFD可以帮助Kubernetes调度器更好地了解节点的特性和资源,从而更智能地选择最适合运行特定工作负载的节点。例如,如果某个Pod需要较强的GPU支持,调度器可以利用NFD标签来选择具有适当GPU型号的节点。
资源约束和优化:通过将节点的特性能力以标签的形式暴露给Kubernetes调度器,集群管理员可以更好地理解和利用集群中节点的资源情况,从而更好地进行资源约束和优化。
硬件感知的工作负载调度:对于特定的工作负载,可能需要特定类型或配置的硬件。NFD可以使调度器能够更加智能地选择具有适当硬件特性的节点来运行这些工作负载。
集群扩展性和性能:通过更智能地分配工作负载到节点,NFD可以提高集群的整体性能和效率。它可以帮助避免资源浪费,并确保工作负载能够充分利用可用的硬件资源。
集群自动化:NFD可以集成到自动化流程中,例如自动化部署或缩放工作负载。通过使用NFD,自动化系统可以更好地了解节点的特性和资源,从而更好地执行相应的操作。
总的来说,Node Feature Discovery(NFD)可以帮助提高Kubernetes集群的智能程度,使其能够更好地适应各种类型的工作负载和节点特性,从而提高集群的性能、可靠性和效率。

5、总结
总的来说,如果您的 Kubernetes 集群需要根据节点的硬件特性进行智能调度或者对节点的硬件资源进行感知和利用,那么安装 Node Feature Discovery(NFD)是有必要的。然而,如果您的集群中的节点都具有相似的硬件配置,且不需要考虑硬件资源的差异,那么不需要安装 NFD。
Kubernetes集群部署Node Feature Discovery组件用于检测集群节点特性的更多相关文章
- kubernetes之手动部署k8s 1.14.1高可用集群
1. 架构信息 系统版本:CentOS 7.6 内核:3.10.0-957.el7.x86_64 Kubernetes: v1.14.1 Docker-ce: 18.09.5 推荐硬件配置:4核8G ...
- kubernetes实战之部署一个接近生产环境的consul集群
系列目录 前面我们介绍了如何在windows单机以及如何基于docker部署consul集群,看起来也不是很复杂,然而如果想要把consul部署到kubernetes集群中并充分利用kubernete ...
- k8s集群部署之环境介绍与etcd数据库集群部署
角色 IP 组件 配置 master-1 192.168.10.11 kube-apiserver kube-controller-manager kube-scheduler etcd 2c 2g ...
- Kubernetes集群部署关键知识总结
Kubernetes集群部署需要安装的组件东西很多,过程复杂,对服务器环境要求很苛刻,最好是能连外网的环境下安装,有些组件还需要连google服务器下载,这一点一般很难满足,因此最好是能提前下载好准备 ...
- 企业运维实践-还不会部署高可用的kubernetes集群?使用kubeadm方式安装高可用k8s集群v1.23.7
关注「WeiyiGeek」公众号 设为「特别关注」每天带你玩转网络安全运维.应用开发.物联网IOT学习! 希望各位看友[关注.点赞.评论.收藏.投币],助力每一个梦想. 文章目录: 0x00 前言简述 ...
- 谈一谈Elasticsearch的集群部署
Elasticsearch天生就支持分布式部署,通过集群部署可以提高系统的可用性.本文重点谈一谈Elasticsearch的集群节点相关问题,搞清楚这些是进行Elasticsearch集群部署和拓 ...
- etcd集群部署与遇到的坑(转)
原文 https://www.cnblogs.com/breg/p/5728237.html etcd集群部署与遇到的坑 在k8s集群中使用了etcd作为数据中心,在实际操作中遇到了一些坑.今天记录一 ...
- Redis集群部署与维护
Redis集群部署与维护 目录: 一. 集群架构 二. 集群部署 1. 创建redis-cluster目录 2. 编译redis 3. 编辑redis配置文件 4. 配置redis集群 5. redi ...
- Openfire 集群部署和负载均衡方案
Openfire 集群部署和负载均衡方案 一. 概述 Openfire是在即时通讯中广泛使用的XMPP协议通讯服务器,本方案采用Openfire的Hazelcast插件进行集群部署,采用Hapro ...
- CAS 集群部署session共享配置
背景 前段时间,项目计划搞独立的登录鉴权中心,由于单独开发一套稳定的登录.鉴权代码,工作量大,最终的方案是对开源鉴权中心CAS(Central Authentication Service)作适配修改 ...
随机推荐
- Ubuntu22.04 & Win11 双系统hibernate热切换实现
Ubuntu22.04 & Win11 双系统hibernate热切换实现 目录 Ubuntu22.04 & Win11 双系统hibernate热切换实现 修改交换分区或交换文件 修 ...
- 散片便宜300元!但还是劝你买盒装CPU
喜欢DIY的小伙伴在选购产品时会纠结于散片和盒装,以13代酷睿i5-13600KF为例,散片一般是1899元左右,而盒装2199元,两者相差300元,AMD的锐龙5 7600也差不多,盒装和散片相差也 ...
- 基于客户真实使用场景的云剪辑Timeline问题解答与代码实操
本文为阿里云智能媒体服务IMS「云端智能剪辑」实践指南第6期,从客户真实实践场景出发,分享一些Timeline小技巧(AI_TTS.主轨道.素材对齐),助力客户降低开发时间与成本. 欧叔|作者 故事的 ...
- 深入浅出Java多线程(七):重排序与Happens-Before
引言 大家好,我是你们的老伙计秀才!今天带来的是[深入浅出Java多线程]系列的第七篇内容:重排序与Happens-Before.大家觉得有用请点赞,喜欢请关注!秀才在此谢过大家了!!! 在上一篇文章 ...
- 2.2 实验:UPX脱壳--《恶意代码分析实战》
Lab01-02.exe 实验内容: 1.将文件上传到http://www.VirusTotal.com 进行分析并查看报告.文件匹配到了已有的反病毒软件特征吗? 2.是否有这个文件被加壳或混淆的任何 ...
- Spring Boot 1.5.x 结合 JUnit5 进行接口测试
在Spring Boot 1.5.x中,默认使用Junit4进行测试.而在对Controller进行接口测试的时候,使用 @AutoConfigureMockMvc 注解是不能注入 MockMvc 对 ...
- ABC 332
ABCDF 都赛时做出来了. E \(\displaystyle\dfrac{1}{D}\sum_{i=1}^D (x_i-\overline{x})^2=\dfrac{1}{D}(\sum_{i=1 ...
- Python xpath语法与 lxml 模块
XPath 语法 XPath 使用路径表达式来选取 XML 文档中的节点或节点集.节点是通过沿着路径 (path) 或者步 (steps) 来选取的. XML 实例文档 我们将在下面的例子中使用这个 ...
- NC23803 DongDong认亲戚
题目链接 题目 题目描述 DongDong每年过春节都要回到老家探亲,然而DongDong记性并不好,没法想起谁是谁的亲戚(定义:若A和B是亲戚,B和C是亲戚,那么A和C也是亲戚),她只好求助于会编程 ...
- 【Unity3D】导航系统
1 导航系统简介 导航系统用于智能避障并寻找目标物体,如:王者荣耀中,当玩家跑到敌方塔的攻击范围内,敌方塔就会发射火团攻击玩家,当玩家逃跑时,火团会智能跟随玩家,其中智能跟随就使用到了导航系统. ...