Kubernetes Kubelet在GKE上执行活动性/就绪性探测检查时,会定期丢失与POD的TCP连接

Kubernetes Kubelet在GKE上执行活动性/就绪性探测检查时,会定期丢失与POD的TCP连接,kubernetes,Kubernetes,我们在单个GKE(google kubernetes engine)集群节点中部署了一个软件系统,该节点使用大约100个pod,在每个pod中我们定义了TCP就绪探测,现在我们可以看到就绪探测周期性地失败,无法连接到远程主机:不同pod上的连接被拒绝 通过集群节点上的tcpdump跟踪和故障pod,我们发现从集群节点发送的数据包似乎是正确的,而pod没有接收到TCP数据包,但故障pod仍然可以接收IP广播数据包 奇怪的是,如果我们从失败的pod ping/curl/wget集群节点,不管集群节点

我们在单个GKE(google kubernetes engine)集群节点中部署了一个软件系统,该节点使用大约100个pod,在每个pod中我们定义了TCP就绪探测,现在我们可以看到就绪探测周期性地失败,无法连接到远程主机:不同pod上的连接被拒绝

通过集群节点上的tcpdump跟踪和故障pod,我们发现从集群节点发送的数据包似乎是正确的,而pod没有接收到TCP数据包,但故障pod仍然可以接收IP广播数据包

奇怪的是,如果我们从失败的pod ping/curl/wget集群节点,不管集群节点是否有http服务,TCP连接都会立即恢复,就绪检查也会变得很好

一个例子如下:

root@mariadb:/# ping 10.44.0.1
PING 10.44.0.1 (10.44.0.1): 56 data bytes
64 bytes from 10.44.0.1: icmp_seq=0 ttl=64 time=3.301 ms
64 bytes from 10.44.0.1: icmp_seq=1 ttl=64 time=0.303 ms
群集节点主机:10.44.0.1 失败的pod主机:10.44.0.92

群集节点cbr0接口上的tcpdump

#sudo tcpdump -i cbr0  host 10.44.0.92

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on cbr0, link-type EN10MB (Ethernet), capture size 262144 bytes
17:33:52.913052 ARP, Request who-has 10.44.0.1 tell 10.44.0.92, length 28
17:33:52.913181 ARP, Reply 10.44.0.1 is-at 0a:58:0a:2c:00:01 (oui Unknown), length 28
17:33:57.727497 IP 10.44.0.1.47736 > 10.44.0.92.mysql: Flags [S], seq 756717730, win 28400, options [mss 1420,sackOK,TS val 1084890021 ecr 0,nop,wscale 7], length 0
17:33:57.727537 IP 10.44.0.92.mysql > 10.44.0.1.47736: Flags [R.], seq 0, ack 756717731, win 0, length 0
17:34:07.727563 IP 10.44.0.1.48202 > 10.44.0.92.mysql: Flags [S], seq 2235831098, win 28400, options [mss 1420,sackOK,TS val 1084900021 ecr 0,nop,wscale 7], length 0
17:34:07.727618 IP 10.44.0.92.mysql > 10.44.0.1.48202: Flags [R.], seq 0, ack 2235831099, win 0, length 0
17:34:12.881059 ARP, Request who-has 10.44.0.92 tell 10.44.0.1, length 28
17:34:12.881176 ARP, Reply 10.44.0.92 is-at 0a:58:0a:2c:00:5c (oui Unknown), length 28
这些是从Kubelet发送的准备就绪检查数据包,我们可以看到失败节点用
标志[R.],seq 0,ack 756717731,win 0,length 0响应,这是一个TCP握手ack/SYN应答,它是一个失败的数据包,TCP连接将不会建立

if we
exec-it
将执行程序发送到故障pod,并从pod ping集群节点,如下所示:

root@mariadb:/# ping 10.44.0.1
PING 10.44.0.1 (10.44.0.1): 56 data bytes
64 bytes from 10.44.0.1: icmp_seq=0 ttl=64 time=3.301 ms
64 bytes from 10.44.0.1: icmp_seq=1 ttl=64 time=0.303 ms
然后,让我们从TCP转储中查看集群节点端发生了什么:

#sudo tcpdump -i cbr0  host 10.44.0.92

17:34:17.728039 IP 10.44.0.92.mysql > 10.44.0.1.48704: Flags [R.], seq 0, ack 2086181490, win 0, length 0
17:34:27.727638 IP 10.44.0.1.49202 > 10.44.0.92.mysql: Flags [S], seq 1769056007, win 28400, options [mss 1420,sackOK,TS val 1084920022 ecr 0,nop,wscale 7], length 0
17:34:27.727693 IP 10.44.0.92.mysql > 10.44.0.1.49202: Flags [R.], seq 0, ack 1769056008, win 0, length 0
17:34:34.016995 ARP, Request who-has 10.44.0.1 tell 10.44.0.92, length 28
17:34:34.018358 ARP, Reply 10.44.0.1 is-at 0a:58:0a:2c:00:01 (oui Unknown), length 28
17:34:34.020020 IP 10.44.0.92 > 10.44.0.1: ICMP echo request, id 53, seq 0, length 64
17:34:34.020101 IP 10.44.0.1 > 10.44.0.92: ICMP echo reply, id 53, seq 0, length 64
17:34:35.017197 IP 10.44.0.92 > 10.44.0.1: ICMP echo request, id 53, seq 1, length 64
17:34:35.017256 IP 10.44.0.1 > 10.44.0.92: ICMP echo reply, id 53, seq 1, length 64
17:34:36.018589 IP 10.44.0.92 > 10.44.0.1: ICMP echo request, id 53, seq 2, length 64
17:34:36.018700 IP 10.44.0.1 > 10.44.0.92: ICMP echo reply, id 53, seq 2, length 64
17:34:37.019791 IP 10.44.0.92 > 10.44.0.1: ICMP echo request, id 53, seq 3, length 64
17:34:37.019837 IP 10.44.0.1 > 10.44.0.92: ICMP echo reply, id 53, seq 3, length 64
17:34:37.730849 IP 10.44.0.1.49666 > 10.44.0.92.mysql: Flags [S], seq 1304758051, win 28400, options [mss 1420,sackOK,TS val 1084930025 ecr 0,nop,wscale 7], length 0
17:34:37.730900 IP 10.44.0.92.mysql > 10.44.0.1.49666: Flags [S.], seq 1267340310, ack 1304758052, win 28160, options [mss 1420,sackOK,TS val 3617117819 ecr 1084930025,nop,wscale 7], length 0
17:34:37.730952 IP 10.44.0.1.49666 > 10.44.0.92.mysql: Flags [.], ack 1, win 222, options [nop,nop,TS val 1084930025 ecr 3617117819], length 0
17:34:37.731149 IP 10.44.0.1.49666 > 10.44.0.92.mysql: Flags [F.], seq 1, ack 1, win 222, options [nop,nop,TS val 1084930025 ecr 3617117819], length 0
17:34:37.731268 IP 10.44.0.92.mysql > 10.44.0.1.49666: Flags [P.], seq 1:107, ack 2, win 220, options [nop,nop,TS val 3617117819 ecr 1084930025], length 106
17:34:37.731322 IP 10.44.0.1.49666 > 10.44.0.92.mysql: Flags [R], seq 1304758053, win 0, length 0
17:34:47.728119 IP 10.44.0.1.50138 > 10.44.0.92.mysql: Flags [S], seq 502800802, win 28400, options [mss 1420,sackOK,TS val 1084940022 ecr 0,nop,wscale 7], length 0
17:34:47.728179 IP 10.44.0.92.mysql > 10.44.0.1.50138: Flags [S.], seq 4294752326, ack 502800803, win 28160, options [mss 1420,sackOK,TS val 3617127816 ecr 1084940022,nop,wscale 7], length 0
我们可以看到ICMP数据包是从pod发送的ping命令数据包,在ICMP数据包之后,就绪检查数据包现在立即变为正确,TCP handleshake成功

ping不仅可以使其工作,其他命令(如curl/wget)也可以使其工作,只需从故障pod到达群集节点,之后,从群集节点到pod的TCP连接将变得正确

失败的pod会不时发生变化,这可能发生在任何pod上,因为节点上有100个pod正在运行,不确定它是否会触发某些系统限制,但是其他所有pod都正常工作,我们看不到巨大的CPU利用率,并且节点上仍有少量GB内存


有人知道问题出在哪里吗?

您是否检查了不可配置POD的系统日志中是否有任何错误,例如contrack错误?你能提供吗
kubectl log
@liqingsong,你是找到了问题的根源还是找到了解决办法?我们自己在GKE上碰到了类似的东西。