ETCD中损坏的文件导致kubernetes无法启动
节点(和主机)重新启动后,etcd中的文件损坏:ETCD中损坏的文件导致kubernetes无法启动,kubernetes,etcd,Kubernetes,Etcd,节点(和主机)重新启动后,etcd中的文件损坏: my_node: ~ # cd /var/lib/etcd/member/snap/ my_node: snap # ls -lsa ls: could not access 0000000000000005-00000000008cb33c.snap: input/output error totale 5040 4 drwx------ 2 root root 4096 3 apr 11.20 . 4 drwx-----
my_node: ~ # cd /var/lib/etcd/member/snap/
my_node: snap # ls -lsa
ls: could not access 0000000000000005-00000000008cb33c.snap: input/output error
totale 5040
4 drwx------ 2 root root 4096 3 apr 11.20 .
4 drwx------ 4 root root 4096 3 apr 11.20 ..
8 -rw-r--r-- 1 root root 8177 2 apr 14.14 0000000000000005-00000000008c3e09.snap
8 -rw-r--r-- 1 root root 8177 2 apr 16.31 0000000000000005-00000000008c651a.snap
8 -rw-r--r-- 1 root root 8177 2 apr 18.48 0000000000000005-00000000008c8c2b.snap
? -????????? ? ? ? ? ? 0000000000000005-00000000008cb33c.snap
8 -rw-r--r-- 1 root root 8177 1 apr 20.01 0000000000000005-00000000008cda4d.snap.broken
5000 -rw------- 1 root root 16805888 2 apr 07.20 db
带有ETCD的容器显示紧急错误:
2018-04-03 09:20:23.578267 W | snap: cannot rename broken snapshot file /var/lib/etcd/member/snap/0000000000000005-00000000008cb33c.snap to /var/lib/etcd/member/snap/0000000000000005-00000000008cb33c.snap.broken: rename /var/lib/etcd/member/snap/0000000000000005-00000000008cb33c.snap /var/lib/etcd/member/snap/0000000000000005-00000000008cb33c.snap.broken: input/output error
2018-04-03 09:20:23.579220 I | etcdserver: recovered store from snapshot at index 9210923
2018-04-03 09:20:23.579250 I | etcdserver: name = default
2018-04-03 09:20:23.579257 I | etcdserver: data dir = /var/lib/etcd
2018-04-03 09:20:23.579263 I | etcdserver: member dir = /var/lib/etcd/member
2018-04-03 09:20:23.579269 I | etcdserver: heartbeat = 100ms
2018-04-03 09:20:23.579273 I | etcdserver: election = 1000ms
2018-04-03 09:20:23.579278 I | etcdserver: snapshot count = 10000
2018-04-03 09:20:23.579294 I | etcdserver: advertise client URLs = http://127.0.0.1:2379
2018-04-03 09:20:23.579714 I | etcdserver: restarting member 0 in cluster 0 at commit index 0
panic: cannot use none as id
goroutine 1 [running]: ...
我正在运行单节点群集
面对这个问题的最佳策略是什么?
欢迎任何建议。Kubernetes或etcd本身没有问题,任何试图在服务器重新启动时写入文件的应用程序都可能出现这种情况 问题的根本原因是文件系统中有一个损坏的文件。我不知道您使用的是哪种FS,但在大多数情况下,这种错误应该在下次引导时由系统修复,但如果不能修复,则意味着问题很严重 我能给你的建议是:
这不是Kubernetes或etcd本身的问题,任何试图在服务器重新启动时写入文件的应用程序都可能出现这种情况 问题的根本原因是文件系统中有一个损坏的文件。我不知道您使用的是哪种FS,但在大多数情况下,这种错误应该在下次引导时由系统修复,但如果不能修复,则意味着问题很严重 我能给你的建议是: