Amazon ec2 出站连接响应失败

Amazon ec2 出站连接响应失败,amazon-ec2,kubernetes,kube-dns,kube-proxy,project-calico,Amazon Ec2,Kubernetes,Kube Dns,Kube Proxy,Project Calico,我在进行出站连接(如RPC调用)时遇到间歇性响应失败,我的应用程序(Java)会这样记录: org.apache.http.NoHttpResponseException:RPC_SERVER.com:443未能响应 出站连接流 Kubernetes节点->内部NGINX的ELB->内部NGINX->[上游到]->ELB RPC服务器->RPC服务器实例 通常的EC2(AWS)上不会出现此问题 通过这样做,我可以在本地主机上复制 在端口9200中运行充当客户端的主应用程序 在端口9205中运行R

我在进行出站连接(如RPC调用)时遇到间歇性响应失败,我的应用程序(Java)会这样记录:

org.apache.http.NoHttpResponseException:RPC_SERVER.com:443未能响应

出站连接流

Kubernetes节点->内部NGINX的ELB->内部NGINX->[上游到]->ELB RPC服务器->RPC服务器实例

通常的EC2(AWS)上不会出现此问题

通过这样做,我可以在本地主机上复制

  • 在端口9200中运行充当客户端的主应用程序
  • 在端口9205中运行RPC服务器
  • 客户端将使用端口9202与服务器建立连接
  • 运行
    $socat TCP4-LISTEN:9202,重用将在端口9202上侦听的EADDR TCP4:localhost:9205
    ,然后将其转发到9205(RPC服务器)
  • 使用
    $sudo iptables-A INPUT-p tcp--dport 9202-j DROP添加iptables规则
  • 触发一个RPC调用,它将返回与以前相同的错误消息
  • 假设

    由kubernetes上的NAT引起,据我所知,NAT正在使用
    conntrack
    conntrack
    ,如果TCP连接空闲一段时间,客户端将假定连接仍然建立,尽管它不是。(CMIIW)

    我还尝试将kube dns扩展为10个副本,但问题仍然存在

    节点规范

    使用印花布作为网络插件

    $sysctl-a | grep conntrack

    net.netfilter.nf_conntrack_acct = 0
    net.netfilter.nf_conntrack_buckets = 65536
    net.netfilter.nf_conntrack_checksum = 1
    net.netfilter.nf_conntrack_count = 1585
    net.netfilter.nf_conntrack_events = 1
    net.netfilter.nf_conntrack_expect_max = 1024
    net.netfilter.nf_conntrack_generic_timeout = 600
    net.netfilter.nf_conntrack_helper = 1
    net.netfilter.nf_conntrack_icmp_timeout = 30
    net.netfilter.nf_conntrack_log_invalid = 0
    net.netfilter.nf_conntrack_max = 262144
    net.netfilter.nf_conntrack_tcp_be_liberal = 0
    net.netfilter.nf_conntrack_tcp_loose = 1
    net.netfilter.nf_conntrack_tcp_max_retrans = 3
    net.netfilter.nf_conntrack_tcp_timeout_close = 10
    net.netfilter.nf_conntrack_tcp_timeout_close_wait = 3600
    net.netfilter.nf_conntrack_tcp_timeout_established = 86400
    net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120
    net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
    net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
    net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60
    net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120
    net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
    net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300
    net.netfilter.nf_conntrack_timestamp = 0
    net.netfilter.nf_conntrack_udp_timeout = 30
    net.netfilter.nf_conntrack_udp_timeout_stream = 180
    net.nf_conntrack_max = 262144
    
    Kubelet配置

    [Service]
    Restart=always
    Environment="KUBELET_KUBECONFIG_ARGS=--kubeconfig=/etc/kubernetes/kubelet.conf --require-kubeconfig=true"
    Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true"
    Environment="KUBELET_NETWORK_ARGS=--network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin"
    Environment="KUBELET_DNS_ARGS=--cluster-dns=10.96.0.10 --cluster-domain=cluster.local"
    Environment="KUBELET_AUTHZ_ARGS=--authorization-mode=Webhook --client-ca-file=/etc/kubernetes/pki/ca.crt"
    Environment="KUBELET_CADVISOR_ARGS=--cadvisor-port=0"
    Environment="KUBELET_CLOUD_ARGS=--cloud-provider=aws"
    ExecStart=
    ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_CADVISOR_ARGS $KUBELET_EXTRA_ARGS $KUBELET_CLOUD_ARGS
    
    Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.5", GitCommit:"17d7182a7ccbb167074be7a87f0a68bd00d58d97", GitTreeState:"clean", BuildDate:"2017-08-31T09:14:02Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.7", GitCommit:"8e1552342355496b62754e61ad5f802a0f3f1fa7", GitTreeState:"clean", BuildDate:"2017-09-28T23:56:03Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
    
    Kubectl版本

    [Service]
    Restart=always
    Environment="KUBELET_KUBECONFIG_ARGS=--kubeconfig=/etc/kubernetes/kubelet.conf --require-kubeconfig=true"
    Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true"
    Environment="KUBELET_NETWORK_ARGS=--network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin"
    Environment="KUBELET_DNS_ARGS=--cluster-dns=10.96.0.10 --cluster-domain=cluster.local"
    Environment="KUBELET_AUTHZ_ARGS=--authorization-mode=Webhook --client-ca-file=/etc/kubernetes/pki/ca.crt"
    Environment="KUBELET_CADVISOR_ARGS=--cadvisor-port=0"
    Environment="KUBELET_CLOUD_ARGS=--cloud-provider=aws"
    ExecStart=
    ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_CADVISOR_ARGS $KUBELET_EXTRA_ARGS $KUBELET_CLOUD_ARGS
    
    Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.5", GitCommit:"17d7182a7ccbb167074be7a87f0a68bd00d58d97", GitTreeState:"clean", BuildDate:"2017-08-31T09:14:02Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.7", GitCommit:"8e1552342355496b62754e61ad5f802a0f3f1fa7", GitTreeState:"clean", BuildDate:"2017-09-28T23:56:03Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
    
    Kube代理日志

    W1004 05:34:17.400700       8 server.go:190] WARNING: all flags other than --config, --write-config-to, and --cleanup-iptables are deprecated. Please begin using a config file ASAP.
    I1004 05:34:17.405871       8 server.go:478] Using iptables Proxier.
    W1004 05:34:17.414111       8 server.go:787] Failed to retrieve node info: nodes "ip-172-30-1-20" not found
    W1004 05:34:17.414174       8 proxier.go:483] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP
    I1004 05:34:17.414288       8 server.go:513] Tearing down userspace rules.
    I1004 05:34:17.443472       8 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_max' to 262144
    I1004 05:34:17.443518       8 conntrack.go:52] Setting nf_conntrack_max to 262144
    I1004 05:34:17.443555       8 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
    I1004 05:34:17.443584       8 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
    I1004 05:34:17.443851       8 config.go:102] Starting endpoints config controller
    I1004 05:34:17.443888       8 config.go:202] Starting service config controller
    I1004 05:34:17.443890       8 controller_utils.go:994] Waiting for caches to sync for endpoints config controller
    I1004 05:34:17.443916       8 controller_utils.go:994] Waiting for caches to sync for service config controller
    I1004 05:34:17.544155       8 controller_utils.go:1001] Caches are synced for service config controller
    I1004 05:34:17.544155       8 controller_utils.go:1001] Caches are synced for endpoints config controller
    
    $lsb_发布-s-d

    Ubuntu 16.04.3 LTS

    在包含程序的pod内检查sysctl net.netfilter.nf\u conntrack\u tcp\u timeout\u close\u wait的值。您列出的节点上的值(3600)可能与pod中的值不相同

    如果pod中的值太小(例如60),并且Java客户端在完成传输时用FIN关闭TCP连接,但响应时间超过关闭等待超时时间,nf_conntrack将丢失连接状态,客户端程序将不会收到响应

    您可能需要将客户端程序的行为更改为不使用TCP半关闭,或将
    net.netfilter.nf\u conntrack\u TCP\u timeout\u close\u wait
    的值修改为更大。看