Kubernetes K8S使用IPVS代理模式，当Service的类型为ClusterIP时，如何处理访问service却不能访问后端pod的情况。

背景现象

Kubernetes K8S使用IPVS代理模式，当Service的类型为ClusterIP时，出现访问service却不能访问后端pod的情况。

主机配置规划

服务器名称(hostname)	系统版本	配置	内网IP	外网IP(模拟)
k8s-master	CentOS7.7	2C/4G/20G	172.16.1.110	10.0.0.110
k8s-node01	CentOS7.7	2C/4G/20G	172.16.1.111	10.0.0.111
k8s-node02	CentOS7.7	2C/4G/20G	172.16.1.112	10.0.0.112

场景复现

Deployment的yaml信息

yaml文件

 [root@k8s-master service]# pwd

 /root/k8s_practice/service

 [root@k8s-master service]# cat myapp-deploy.yaml

 apiVersion: apps/v1

 kind: Deployment

 metadata:

   name: myapp-deploy

   namespace: default

 spec:

   replicas:

   selector:

     matchLabels:

       app: myapp

       release: v1

   template:

     metadata:

       labels:

         app: myapp

         release: v1

         env: test

     spec:

       containers:

       - name: myapp

         image: registry.cn-beijing.aliyuncs.com/google_registry/myapp:v1

         imagePullPolicy: IfNotPresent

         ports:

         - name: http

           containerPort:

启动Deployment并查看状态

 [root@k8s-master service]# kubectl apply -f myapp-deploy.yaml

 deployment.apps/myapp-deploy created

 [root@k8s-master service]#

 [root@k8s-master service]# kubectl get deploy -o wide

 NAME           READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS   IMAGES                                                      SELECTOR

 myapp-deploy   /                            14s   myapp        registry.cn-beijing.aliyuncs.com/google_registry/myapp:v1   app=myapp,release=v1

 [root@k8s-master service]# kubectl get rs -o wide

 NAME                      DESIRED   CURRENT   READY   AGE   CONTAINERS   IMAGES                                                      SELECTOR

 myapp-deploy-5695bb5658                            21s   myapp        registry.cn-beijing.aliyuncs.com/google_registry/myapp:v1   app=myapp,pod-template-hash=5695bb5658,release=v1

 [root@k8s-master service]#

 [root@k8s-master service]# kubectl get pod -o wide --show-labels

 NAME                            READY   STATUS    RESTARTS   AGE     IP             NODE         NOMINATED NODE   READINESS GATES   LABELS

 myapp-deploy-5695bb5658-7tgfx   /     Running             39s     10.244.2.111   k8s-node02   <none>           <none>            app=myapp,env=test,pod-template-hash=5695bb5658,release=v1

 myapp-deploy-5695bb5658-95zxm   /     Running             39s     10.244.3.165   k8s-node01   <none>           <none>            app=myapp,env=test,pod-template-hash=5695bb5658,release=v1

 myapp-deploy-5695bb5658-xtxbp   /     Running             39s     10.244.3.164   k8s-node01   <none>           <none>            app=myapp,env=test,pod-template-hash=5695bb5658,release=v1

curl访问

 [root@k8s-master service]# curl 10.244.2.111/hostname.html

 myapp-deploy-5695bb5658-7tgfx

 [root@k8s-master service]#

 [root@k8s-master service]# curl 10.244.3.165/hostname.html

 myapp-deploy-5695bb5658-95zxm

 [root@k8s-master service]#

 [root@k8s-master service]# curl 10.244.3.164/hostname.html

 myapp-deploy-5695bb5658-xtxbp

Service的ClusterIP类型信息

yaml文件

 [root@k8s-master service]# pwd

 /root/k8s_practice/service

 [root@k8s-master service]# cat myapp-svc-ClusterIP.yaml

 apiVersion: v1

 kind: Service

 metadata:

   name: myapp-clusterip

   namespace: default

 spec:

   type: ClusterIP  # 可以不写，为默认类型

   selector:

     app: myapp

     release: v1

   ports:

   - name: http

     port:   # 对外暴露端口

     targetPort:   # 转发到后端端口

启动Service并查看状态

 [root@k8s-master service]# kubectl apply -f myapp-svc-ClusterIP.yaml

 service/myapp-clusterip created

 [root@k8s-master service]#

 [root@k8s-master service]# kubectl get svc -o wide

 NAME              TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE   SELECTOR

 kubernetes        ClusterIP   10.96.0.1        <none>        /TCP    16d   <none>

 myapp-clusterip   ClusterIP   10.102.246.104   <none>        /TCP   6s    app=myapp,release=v1

查看ipvs信息

 [root@k8s-master service]# ipvsadm -Ln

 IP Virtual Server version 1.2. (size=)

 Prot LocalAddress:Port Scheduler Flags

   -> RemoteAddress:Port           Forward Weight ActiveConn InActConn

 ………………

 TCP  10.102.246.104: rr

   -> 10.244.2.111:              Masq

   -> 10.244.3.164:              Masq

   -> 10.244.3.165:              Masq

由此可见，正常情况下：当我们访问Service时，访问链路是能够传递到后端的Pod并返回信息。

Curl访问结果

直接访问Pod，如下所示是能够正常访问的。

 [root@k8s-master service]# curl 10.244.2.111/hostname.html

 myapp-deploy-5695bb5658-7tgfx

 [root@k8s-master service]#

 [root@k8s-master service]# curl 10.244.3.165/hostname.html

 myapp-deploy-5695bb5658-95zxm

 [root@k8s-master service]#

 [root@k8s-master service]# curl 10.244.3.164/hostname.html

 myapp-deploy-5695bb5658-xtxbp

但通过Service访问结果异常，信息如下。

 [root@k8s-master service]# curl 10.102.246.104:

 curl: () Failed connect to 10.102.246.104:; Connection timed out

处理过程

抓包核实

使用如下命令进行抓包，并通过Wireshark工具进行分析。

tcpdump -i any -n -nn port  -w ./$(date +%Y%m%d%H%M%S).pcap

结果如下图：

可见，已经向Pod发了请求，但是没有得到回复。结果TCP又重传了【TCP Retransmission】。

查看kube-proxy日志

 [root@k8s-master service]# kubectl get pod -A | grep 'kube-proxy'

 kube-system            kube-proxy-6bfh7                             /     Running             3h52m

 kube-system            kube-proxy-6vfkf                             /     Running             3h52m

 kube-system            kube-proxy-bvl9n                             /     Running             3h52m

 [root@k8s-master service]#

 [root@k8s-master service]# kubectl logs -n kube-system kube-proxy-6bfh7

 W0601 ::13.170506        feature_gate.go:] Setting GA feature gate SupportIPVSProxyMode=true. It will be removed in a future release.

 I0601 ::13.338922        node.go:] Successfully retrieved node IP: 172.16.1.112

 I0601 ::13.338960        server_others.go:] Using ipvs Proxier.  ##### 可见使用的是ipvs模式

 W0601 ::13.339400        proxier.go:] IPVS scheduler not specified, use rr by default

 I0601 ::13.339638        server.go:] Version: v1.17.4

 I0601 ::13.340126        conntrack.go:] Set sysctl 'net/netfilter/nf_conntrack_max' to

 I0601 ::13.340159        conntrack.go:] Setting nf_conntrack_max to

 I0601 ::13.340500        conntrack.go:] Setting conntrack hashsize to

 I0601 ::13.346991        conntrack.go:] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to

 I0601 ::13.347035        conntrack.go:] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to

 I0601 ::13.347703        config.go:] Starting service config controller

 I0601 ::13.347718        shared_informer.go:] Waiting for caches to sync for service config

 I0601 ::13.347736        config.go:] Starting endpoints config controller

 I0601 ::13.347743        shared_informer.go:] Waiting for caches to sync for endpoints config

 I0601 ::13.448223        shared_informer.go:] Caches are synced for endpoints config

 I0601 ::13.448236        shared_informer.go:] Caches are synced for service config

可见kube-proxy日志无异常

网卡设置并修改

备注：在k8s-master节点操作的

之后进一步搜索表明，这可能是由于“Checksum offloading” 造成的。信息如下：

 [root@k8s-master service]# ethtool -k flannel. | grep checksum

 rx-checksumming: on

 tx-checksumming: on     ##### 当前为 on

     tx-checksum-ipv4: off [fixed]

     tx-checksum-ip-generic: on    ##### 当前为 on

     tx-checksum-ipv6: off [fixed]

     tx-checksum-fcoe-crc: off [fixed]

     tx-checksum-sctp: off [fixed]

flannel的网络设置将发送端的checksum打开了，而实际应该关闭，从而让物理网卡校验。操作如下：

 # 临时关闭操作

 [root@k8s-master service]# ethtool -K flannel. tx-checksum-ip-generic off

 Actual changes:

 tx-checksumming: off

     tx-checksum-ip-generic: off

 tcp-segmentation-offload: off

     tx-tcp-segmentation: off [requested on]

     tx-tcp-ecn-segmentation: off [requested on]

     tx-tcp6-segmentation: off [requested on]

     tx-tcp-mangleid-segmentation: off [requested on]

 udp-fragmentation-offload: off [requested on]

 [root@k8s-master service]#

 # 再次查询结果

 [root@k8s-master service]# ethtool -k flannel. | grep checksum

 rx-checksumming: on

 tx-checksumming: off     ##### 当前为 off

     tx-checksum-ipv4: off [fixed]

     tx-checksum-ip-generic: off     ##### 当前为 off

     tx-checksum-ipv6: off [fixed]

     tx-checksum-fcoe-crc: off [fixed]

     tx-checksum-sctp: off [fixed]

当然上述操作只能临时生效。机器重启后flannel虚拟网卡还会开启Checksum校验。

之后我们再次curl尝试

 [root@k8s-master ~]# curl 10.102.246.104:

 Hello MyApp | Version: v1 | <a href="hostname.html">Pod Name</a>

 [root@k8s-master ~]#

 [root@k8s-master ~]# curl 10.102.246.104:/hostname.html

 myapp-deploy-5695bb5658-7tgfx

 [root@k8s-master ~]#

 [root@k8s-master ~]# curl 10.102.246.104:/hostname.html

 myapp-deploy-5695bb5658-95zxm

 [root@k8s-master ~]#

 [root@k8s-master ~]# curl 10.102.246.104:/hostname.html

 myapp-deploy-5695bb5658-xtxbp

 [root@k8s-master ~]#

 [root@k8s-master ~]# curl 10.102.246.104:/hostname.html

 myapp-deploy-5695bb5658-7tgfx

由上可见，能够正常访问了。

永久关闭flannel网卡发送校验

备注：所有机器都操作

使用以下代码创建服务

 [root@k8s-node02 ~]# cat /etc/systemd/system/k8s-flannel-tx-checksum-off.service

 [Unit]

 Description=Turn off checksum offload on flannel.

 After=sys-devices-virtual-net-flannel..device

 [Install]

 WantedBy=sys-devices-virtual-net-flannel..device

 [Service]

 Type=oneshot

 ExecStart=/sbin/ethtool -K flannel. tx-checksum-ip-generic off

开机自启动，并启动服务

 systemctl enable k8s-flannel-tx-checksum-off

 systemctl start  k8s-flannel-tx-checksum-off

Kubernetes K8S在IPVS代理模式下Service服务的ClusterIP类型访问失败处理