Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/docker/10.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/apache-kafka/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Docker随机重新启动所有服务_Docker_Docker Swarm - Fatal编程技术网

Docker随机重新启动所有服务

Docker随机重新启动所有服务,docker,docker-swarm,Docker,Docker Swarm,目前,我们所有Docker swarm服务(生产中)都会间歇性和随机重启。这种行为是几天前开始的,到目前为止,我们的服务已经重启了大约6次。 当我检查/var/log/messages时,我们会在服务重新启动时看到以下内容 Jul 18 14:27:07 MASTER_SERVER dockerd: time="2019-07-18T14:27:07.777591370+02:00" level=info msg="NetworkDB stats OUR_MASTER_SERVER(1f6c53

目前,我们所有Docker swarm服务(生产中)都会间歇性和随机重启。这种行为是几天前开始的,到目前为止,我们的服务已经重启了大约6次。 当我检查
/var/log/messages
时,我们会在服务重新启动时看到以下内容

Jul 18 14:27:07 MASTER_SERVER dockerd: time="2019-07-18T14:27:07.777591370+02:00" level=info msg="NetworkDB stats OUR_MASTER_SERVER(1f6c53091b31) - netID:xqo1jd54dy9h4si7p5kd147lf leaving:false netPeers:2 entries:16 Queue qLen:0 netMsg/s:0"
Jul 18 14:27:07 MASTER_SERVER dockerd: time="2019-07-18T14:27:07.778866829+02:00" level=info msg="NetworkDB stats OUR_MASTER_SERVER(1f6c53091b31) - netID:hlxpttt1y61naebb1ehva5oa0 leaving:false netPeers:2 entries:42 Queue qLen:0 netMsg/s:0"
Jul 18 14:27:07 MASTER_SERVER dockerd: time="2019-07-18T14:27:07.778894355+02:00" level=info msg="NetworkDB stats OUR_MASTER_SERVER(1f6c53091b31) - netID:hccgvewpmr4qg93smkf1zb3gk leaving:false netPeers:2 entries:4 Queue qLen:0 netMsg/s:0"
Jul 18 14:28:05 MASTER_SERVER sshd[64861]: rexec line 25: Deprecated option KeyRegenerationInterval
Jul 18 14:28:05 MASTER_SERVER sshd[64861]: rexec line 26: Deprecated option ServerKeyBits
Jul 18 14:28:05 MASTER_SERVER sshd[64861]: rexec line 40: Deprecated option RSAAuthentication
Jul 18 14:28:05 MASTER_SERVER sshd[64861]: rexec line 42: Deprecated option RhostsRSAAuthentication
Jul 18 14:28:05 MASTER_SERVER sshd[64861]: rexec line 61: Deprecated option UseLogin
Jul 18 14:28:37 MASTER_SERVER dockerd: time="2019-07-18T14:28:37.219729414+02:00" level=warning msg="memberlist: Refuting a suspect message (from: b19e7378f796)"
Jul 18 14:28:37 MASTER_SERVER dockerd: time="2019-07-18T14:28:37.242915522+02:00" level=error msg="heartbeat to manager { } failed" error="rpc error: code = NotFound desc = node not registered" method="(*session).heartbeat" module=node/agent node.id=0bf0j7rcw6xg7pda5lka7jyy8 session.id=mhf0hqu2wo4t6x1n8ja4tzwr3 sessionID=mhf0hqu2wo4t6x1n8ja4tzwr3
Jul 18 14:28:37 MASTER_SERVER dockerd: time="2019-07-18T14:28:37.377485716+02:00" level=error msg="agent: session failed" backoff=100ms error="node not registered" module=node/agent node.id=0bf0j7rcw6xg7pda5lka7jyy8
Jul 18 14:28:37 MASTER_SERVER dockerd: time="2019-07-18T14:28:37.378175039+02:00" level=info msg="manager selected by agent for new session: { }" module=node/agent node.id=0bf0j7rcw6xg7pda5lka7jyy8
Jul 18 14:28:37 MASTER_SERVER dockerd: time="2019-07-18T14:28:37.378223959+02:00" level=info msg="waiting 78.699854ms before registering session" module=node/agent node.id=0bf0j7rcw6xg7pda5lka7jyy8
Jul 18 14:28:37 MASTER_SERVER dockerd: time="2019-07-18T14:28:37.392797756+02:00" level=info msg="worker 2ni1yglnxjxlpopn7kwxh6oh5 was successfully registered" method="(*Dispatcher).register"
Jul 18 14:28:37 MASTER_SERVER dockerd: time="2019-07-18T14:28:37.497010348+02:00" level=info msg="worker 0bf0j7rcw6xg7pda5lka7jyy8 was successfully registered" method="(*Dispatcher).register"
Jul 18 14:28:39 MASTER_SERVER kernel: IPVS: rr: FWM 452 0x000001C4 - no destination available
Jul 18 14:28:39 MASTER_SERVER kernel: IPVS: rr: FWM 452 0x000001C4 - no destination available
Jul 18 14:28:39 MASTER_SERVER kernel: IPVS: rr: FWM 451 0x000001C3 - no destination available
Jul 18 14:28:40 MASTER_SERVER NetworkManager[8489]: <info>  [1563452920.8734] manager: (veth33d0b84): new Veth device (/org/freedesktop/NetworkManager/Devices/1919)
Jul 18 14:28:40 MASTER_SERVER NetworkManager[8489]: <info>  [1563452920.8750] manager: (veth1ba455b): new Veth device (/org/freedesktop/NetworkManager/Devices/1920)
Jul 18 14:28:40 MASTER_SERVER kernel: br0: port 2(veth260) entered blocking state
Jul 18 14:28:40 MASTER_SERVER kernel: br0: port 2(veth260) entered disabled state
Jul 18 14:28:40 MASTER_SERVER kernel: device veth260 entered promiscuous mode
Jul 18 14:28:41 MASTER_SERVER NetworkManager[8489]: <info>  [1563452921.0567] manager: (veth9b724a9): new Veth device (/org/freedesktop/NetworkManager/Devices/1921)
Jul 18 14:28:41 MASTER_SERVER NetworkManager[8489]: <info>  [1563452921.0604] manager: (veth328f5ab): new Veth device (/org/freedesktop/NetworkManager/Devices/1922)
Jul 18 14:28:41 MASTER_SERVER kernel: br0: port 4(veth83) entered blocking state
Jul 18 14:28:41 MASTER_SERVER kernel: br0: port 4(veth83) entered disabled state
Jul 18 14:28:41 MASTER_SERVER kernel: device veth83 entered promiscuous mode
Jul 18 14:28:41 MASTER_SERVER kernel: br0: port 4(veth83) entered blocking state
Jul 18 14:28:41 MASTER_SERVER kernel: br0: port 4(veth83) entered forwarding state
Jul 18 14:28:41 MASTER_SERVER dockerd: time="2019-07-18T14:28:41+02:00" level=info msg="shim reaped" id=af9c57b5a3e15cf84eb6c11e81b9b1283c86d289819600ddd83fd5cf791141cf module="containerd/tasks"
Jul 18 14:28:41 MASTER_SERVER dockerd: time="2019-07-18T14:28:41+02:00" level=info msg="shim reaped" id=08e21510d82e56462f4bcfe5e4d734c8fcf21c92470afcf711bca5c7b1a29e88 module="containerd/tasks"
Jul 18 14:28:41 MASTER_SERVER dockerd: time="2019-07-18T14:28:41.126973723+02:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Jul 18 14:28:41 MASTER_SERVER dockerd: time="2019-07-18T14:28:41.127491897+02:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Jul 18 14:28:41 MASTER_SERVER dockerd: time="2019-07-18T14:28:41.128341866+02:00" level=warning msg="rmServiceBinding 208e26beb7581a7e09041ebf23d6170f0f7ad7785a28b0d5b37b4194cca8d64e possible transient state ok:false entries:0 set:false "
Jul 18 14:28:41 MASTER_SERVER dockerd: time="2019-07-18T14:28:41+02:00" level=info msg="shim reaped" id=99889097e3444a485567d85d5dcdbd1f56c49369207f3137d7c80d7c8efd1168 module="containerd/tasks"
Jul 18 14:28:41 MASTER_SERVER dockerd: time="2019-07-18T14:28:41.132298072+02:00" level=warning msg="rmServiceBinding 055c400762be47b9e7c09c3a948c17b049a91b0cb53fbcefaee5f48f448fb91a possible transient state ok:false entries:0 set:false "
Jul 18 14:28:41 MASTER_SERVER dockerd: time="2019-07-18T14:28:41.140787013+02:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Jul 18 14:28:41 MASTER_SERVER NetworkManager[8489]: <info>  [1563452921.1487] manager: (veth7b97007): new Veth device (/org/freedesktop/NetworkManager/Devices/1923)
Jul 18 14:28:41 MASTER_SERVER kernel: br0: port 4(veth83) entered disabled state
Jul 18 14:28:41 MASTER_SERVER kernel: br0: port 16(veth257) entered disabled state
...
MORE OF THE SAME
...
Jul 18 14:28:54 MASTER_SERVER kernel: br0: port 7(veth82) entered disabled state
Jul 18 14:28:54 MASTER_SERVER kernel: device veth82 left promiscuous mode
Jul 18 14:28:54 MASTER_SERVER kernel: br0: port 7(veth82) entered disabled state
Jul 18 14:28:54 MASTER_SERVER kernel: IPVS: __ip_vs_del_service: enter
Jul 18 14:28:54 MASTER_SERVER kernel: IPVS: __ip_vs_del_service: enter
Jul 18 14:28:54 MASTER_SERVER kernel: IPVS: __ip_vs_del_service: enter
Jul 18 14:28:54 MASTER_SERVER kernel: IPVS: __ip_vs_del_service: enter
Jul 18 14:28:54 MASTER_SERVER kernel: IPVS: __ip_vs_del_service: enter
Jul 18 14:28:54 MASTER_SERVER kernel: IPVS: __ip_vs_del_service: enter
Jul 18 14:28:54 MASTER_SERVER kernel: IPVS: __ip_vs_del_service: enter
Jul 18 14:28:54 MASTER_SERVER kernel: IPVS: __ip_vs_del_service: enter
Jul 18 14:28:54 MASTER_SERVER kernel: IPVS: __ip_vs_del_service: enter
Jul 18 14:28:54 MASTER_SERVER kernel: docker_gwbridge: port 17(veth7648a95) entered disabled state
Jul 18 14:28:54 MASTER_SERVER kernel: device veth7648a95 left promiscuous mode
Jul 18 14:28:54 MASTER_SERVER kernel: docker_gwbridge: port 17(veth7648a95) entered disabled state
Jul 18 14:28:54 MASTER_SERVER NetworkManager[8489]: <info>  [1563452934.9971] device (veth7648a95): released from master device docker_gwbridge
Jul 18 14:28:55 MASTER_SERVER kernel: br0: port 15(veth256) entered disabled state
Jul 18 14:28:55 MASTER_SERVER kernel: device veth256 left promiscuous mode
Jul 18 14:28:55 MASTER_SERVER kernel: br0: port 15(veth256) entered disabled state
Jul 18 14:28:55 MASTER_SERVER kernel: docker_gwbridge: port 2(veth865a1b5) entered blocking state
Jul 18 14:28:55 MASTER_SERVER kernel: docker_gwbridge: port 2(veth865a1b5) entered disabled state
Jul 18 14:28:55 MASTER_SERVER kernel: device veth865a1b5 entered promiscuous mode
Jul 18 14:28:55 MASTER_SERVER kernel: br0: port 11(veth253) entered disabled state
Jul 18 14:28:55 MASTER_SERVER dockerd: time="2019-07-18T14:28:55+02:00" level=error msg="setting up rule failed, [-t mangle -D OUTPUT -d 10.0.0.155/32 -j MARK --set-mark 458]:  (iptables failed: iptables --wait -t mangle -D OUTPUT -d 10.0.0.155/32 -j MARK --set-mark 458: iptables: No chain/target/match by that name.\n (exit status 1))"
Jul 18 14:28:55 MASTER_SERVER kernel: device veth253 left promiscuous mode
Jul 18 14:28:55 MASTER_SERVER kernel: br0: port 11(veth253) entered disabled state
Jul 18 14:28:55 MASTER_SERVER dockerd: time="2019-07-18T14:28:55.049845009+02:00" level=error msg="Failed to delete firewall mark rule in sbox e6c95a7 (16cba4b): reexec failed: exit status 5"
... 
MORE OF THIS
...
Jul 18 14:28:57 MASTER_SERVER dockerd: time="2019-07-18T14:28:57.877371310+02:00" level=warning msg="rmServiceBinding a22765c0f00db0a4949b009129fa3efc78e2450567e759ff306dcfd32ebe0043 possible transient state ok:false entries:0 set:false "
Jul 18 14:28:57 MASTER_SERVER dockerd: time="2019-07-18T14:28:57.893365848+02:00" level=error msg="Failed to delete real server 10.0.0.124 for vip 10.0.0.161 fwmark 455 in sbox f9cd880 (f0562de): no such process"
Jul 18 14:28:57 MASTER_SERVER dockerd: time="2019-07-18T14:28:57.893415513+02:00" level=error msg="Failed to delete service for vip 10.0.0.161 fwmark 455 in sbox f9cd880 (f0562de): no such process"
Jul 18 14:28:57 MASTER_SERVER NetworkManager[8489]: <info>  [1563452937.9029] device (veth8339f31): carrier: link connected
Jul 18 14:28:57 MASTER_SERVER kernel: docker_gwbridge: port 3(veth8339f31) entered blocking state
Jul 18 14:28:57 MASTER_SERVER kernel: docker_gwbridge: port 3(veth8339f31) entered forwarding state
Jul 18 14:28:57 MASTER_SERVER NetworkManager[8489]: <info>  [1563452937.9250] device (veth242a19a): carrier: link connected
Jul 18 14:28:57 MASTER_SERVER kernel: docker_gwbridge: port 5(veth242a19a) entered blocking state
Jul 18 14:28:57 MASTER_SERVER kernel: docker_gwbridge: port 5(veth242a19a) entered forwarding state
Jul 18 14:28:57 MASTER_SERVER kernel: docker_gwbridge: port 6(veth08b162c) entered blocking state
Jul 18 14:28:57 MASTER_SERVER kernel: docker_gwbridge: port 6(veth08b162c) entered forwarding state
Jul 18 14:28:57 MASTER_SERVER NetworkManager[8489]: <info>  [1563452937.9454] device (veth08b162c): carrier: link connected
Jul 18 14:28:57 MASTER_SERVER NetworkManager[8489]: <info>  [1563452937.9647] device (veth1c69a54): carrier: link connected
Jul 18 14:28:57 MASTER_SERVER dockerd: time="2019-07-18T14:28:57+02:00" level=error msg="setting up rule failed, [-t mangle -D OUTPUT -d 10.0.0.161/32 -j MARK --set-mark 455]:  (iptables failed: iptables --wait -t mangle -D OUTPUT -d 10.0.0.161/32 -j MARK --set-mark 455: iptables: No chain/target/match by that name.\n (exit status 1))"
Jul 18 14:28:57 MASTER_SERVER dockerd: time="2019-07-18T14:28:57.975416212+02:00" level=error msg="Failed to delete firewall mark rule in sbox f9cd880 (f0562de): reexec failed: exit status 5"
Jul 18 14:28:57 MASTER_SERVER dockerd: time="2019-07-18T14:28:57.975578597+02:00" level=error msg="Failed to delete real server 10.0.0.124 for vip 10.0.0.161 fwmark 455 in sbox fd62ef8 (7801a00): no such process"
Jul 18 14:28:57 MASTER_SERVER dockerd: time="2019-07-18T14:28:57.975609232+02:00" level=error msg="Failed to delete service for vip 10.0.0.161 fwmark 455 in sbox fd62ef8 (7801a00): no such process"
Jul 18 14:28:57 MASTER_SERVER kernel: docker_gwbridge: port 14(veth1c69a54) entered blocking state
Jul 18 14:28:57 MASTER_SERVER kernel: docker_gwbridge: port 14(veth1c69a54) entered forwarding state
Jul 18 14:28:57 MASTER_SERVER kernel: docker_gwbridge: port 2(veth865a1b5) entered blocking state
Jul 18 14:28:57 MASTER_SERVER kernel: docker_gwbridge: port 2(veth865a1b5) entered forwarding state
Jul 18 14:28:57 MASTER_SERVER NetworkManager[8489]: <info>  [1563452937.9814] device (veth865a1b5): carrier: link connected
Jul 18 14:28:57 MASTER_SERVER NetworkManager[8489]: <info>  [1563452937.9902] manager: (veth90d68ef): new Veth device (/org/freedesktop/NetworkManager/Devices/1988)
Jul 18 14:28:57 MASTER_SERVER NetworkManager[8489]: <info>  [1563452937.9937] manager: (veth1c52f1f): new Veth device (/org/freedesktop/NetworkManager/Devices/1989)
Jul 18 14:28:58 MASTER_SERVER kernel: docker_gwbridge: port 9(veth713fa85) entered blocking state
Jul 18 14:28:58 MASTER_SERVER kernel: docker_gwbridge: port 9(veth713fa85) entered disabled state
Jul 18 14:28:58 MASTER_SERVER kernel: device veth713fa85 entered promiscuous mode
Jul 18 14:28:58 MASTER_SERVER kernel: docker_gwbridge: port 9(veth713fa85) entered blocking state
Jul 18 14:28:58 MASTER_SERVER kernel: docker_gwbridge: port 9(veth713fa85) entered forwarding state
Jul 18 14:28:58 MASTER_SERVER NetworkManager[8489]: <info>  [1563452938.0130] manager: (vethdc96c98): new Veth device (/org/freedesktop/NetworkManager/Devices/1990)
Jul 18 14:28:58 MASTER_SERVER NetworkManager[8489]: <info>  [1563452938.0142] manager: (veth713fa85): new Veth device (/org/freedesktop/NetworkManager/Devices/1991)
Jul 18 14:28:58 MASTER_SERVER kernel: br0: port 4(veth269) entered blocking state
Jul 18 14:28:58 MASTER_SERVER kernel: br0: port 4(veth269) entered disabled state
Jul 18 14:28:58 MASTER_SERVER kernel: device veth269 entered promiscuous mode
Jul 18 14:28:58 MASTER_SERVER kernel: br0: port 4(veth269) entered blocking state
Jul 18 14:28:58 MASTER_SERVER kernel: br0: port 4(veth269) entered forwarding state
Jul 18 14:28:58 MASTER_SERVER dockerd: time="2019-07-18T14:28:58+02:00" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/5cadb3994f4d81174c1b686028a6fd3e917c9b8b2ac62b17c4754e59ed4a2158/shim.sock" debug=false module="containerd/tasks" pid=2290
Jul 18 14:28:58 MASTER_SERVER dockerd: time="2019-07-18T14:28:58+02:00" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/536c3eae39192936e649668192e0daf6efba70cc60d1822cd0e8e2e593ebac40/shim.sock" debug=false module="containerd/tasks" pid=2300
Jul 18 14:28:58 MASTER_SERVER kernel: IPVS: Creating netns size=2048 id=305
Jul 18 14:28:58 MASTER_SERVER dockerd: time="2019-07-18T14:28:58+02:00" level=error msg="setting up rule failed, [-t mangle -D OUTPUT -d 10.0.0.161/32 -j MARK --set-mark 455]:  (iptables failed: iptables --wait -t mangle -D OUTPUT -d 10.0.0.161/32 -j MARK --set-mark 455: iptables: No chain/target/match by that name.\n (exit status 1))"
Jul 18 14:28:58 MASTER_SERVER dockerd: time="2019-07-18T14:28:58.283086156+02:00" level=error msg="Failed to delete firewall mark rule in sbox fd62ef8 (7801a00): reexec failed: exit status 5"
Jul 18 14:28:58 MASTER_SERVER dockerd: time="2019-07-18T14:28:58.283246626+02:00" level=error msg="Failed to delete real server 10.0.0.124 for vip 10.0.0.161 fwmark 455 in sbox b828388 (4152113): no such process"
Jul 18 14:28:58 MASTER_SERVER dockerd: time="2019-07-18T14:28:58.283277874+02:00" level=error msg="Failed to delete service for vip 10.0.0.161 fwmark 455 in sbox b828388 (4152113): no such process"
Jul 18 14:28:58 MASTER_SERVER kernel: IPVS: Creating netns size=2048 id=306
现状137:

    "State": {
        "Status": "exited",
        "Running": false,
        "Paused": false,
        "Restarting": false,
        "OOMKilled": false,
        "Dead": false,
        "Pid": 0,
        "ExitCode": 137,
        "Error": "",
        "StartedAt": "2019-07-18T12:29:00.047740568Z",
        "FinishedAt": "2019-07-18T14:33:07.497580769Z",
        "Health": {
            "Status": "unhealthy",
            "FailingStreak": 0,
状态为137的是正常的,因为如果RabbitMQ停止,Spring将报告它不健康,Docker将尝试重新启动。容器没有被OOM杀死,里面的应用程序没有崩溃,Docker只是决定重新启动一切

我们还部署了没有swarm的容器,这些容器从未重新启动过,而且很长一段时间以来一直处于运行状态。只有使用docker服务创建的容器才会重新启动。这也表明docker守护进程也没有重新启动

我们首先假设这是一个网络问题(导致RabbitMQ失败,我们的healthchecks导致重新启动,但即使没有healthcheck的filebeat容器也会重新启动),与此同时,我们已经应用了SACK内核恐慌攻击缓解,但没有任何运气

我们仍然有足够的磁盘空间,所以这也不是问题所在。我们的内存有点不足,但没有什么会导致崩溃

              total        used        free      shared  buff/cache   available
Mem:            15G         11G        1.3G        183M        2.5G        3.2G
Swap:          4.0G        258M        3.7G
docker info
的输出为

Containers: 54
 Running: 14
 Paused: 0
 Stopped: 40
Images: 48
Server Version: 18.03.1-ce
Storage Driver: overlay
 Backing Filesystem: xfs
 Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: 0bf0j7rcw6xg7pda5lka7jyy8
 Is Manager: true
 ClusterID: 0e4yg9p17skhonx5jkaulhrsi
 Managers: 1
 Nodes: 2
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 10
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 10.242.171.24
 Manager Addresses:
  10.242.171.24:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-957.12.2.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.6 (Maipo)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 15.5GiB
Name: OUR_MASTER_SERVER
ID: OZOG:AM6N:XGRS:KZMV:BOHR:QGAX:625X:AFDW:F6AV:P7ZF:FS4R:TCZL
Docker Root Dir: /opt/hpc/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: bridge-nf-call-ip6tables is disabled

有人知道我们为什么要重启吗

我想问题与您的网络连接有关,因为
管理器{}的心跳失败

 Managers: 1
 Nodes: 2
如果为真,则只有一个管理器节点和两个工作节点?
在重新连接到其节点以填充服务定义之后,manager节点将重新启动(我猜也是这样)所有容器

对于生产性工作负载,您应该至少有三个docker swarm管理器。这将使集群更具容错性。
当您的集群仅由三个节点组成时,将两个工作节点也升级到管理器。

因此我做了更多的挖掘,将生产节点置于调试模式并检查日志。 在调试模式下,Docker每隔1-2秒输出一条日志消息,我注意到日志中有多个空隙,一些间隙长达5秒,但有时在超过20秒之后,我不会得到任何日志消息,在这之后,DOCKER会认为主节点(本身)不稳定,并且强迫自己重新启动。使用它拉取所有其他实例(拥有一个主实例的缺点)


在将我们的VM迁移到另一台服务器之后,问题再也没有出现,并且它再次正常运行。我们订购了第三个VM,这样我们可以使用3个主机运行,这样我们就可以在单个主机出现故障时生存。

不幸的是,我们目前只有一个管理器和一个工作节点(管理器也是一个工作节点)。我可以理解为什么管理者会想重新启动工作节点上的所有内容,但我不理解为什么它也要重新启动自身上的所有内容
 Managers: 1
 Nodes: 2