Apache zookeeper 服务器崩溃后如何从java.io.EOFException恢复Zookeeper?

Apache zookeeper 服务器崩溃后如何从java.io.EOFException恢复Zookeeper?,apache-zookeeper,Apache Zookeeper,如何从服务器崩溃后发生的以下错误中恢复? Zookeeper不会启动,下面的消息会在日志上重复显示 2017-05-27 01:02:08,072 [myid:] - INFO [main:Environment@100] - Server environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 2017-05-27 01:02:08,072 [myid:] - INF

如何从服务器崩溃后发生的以下错误中恢复? Zookeeper不会启动,下面的消息会在日志上重复显示

2017-05-27 01:02:08,072 [myid:] - INFO [main:Environment@100] - Server environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 
2017-05-27 01:02:08,072 [myid:] - INFO [main:Environment@100] - Server environment:java.io.tmpdir=/tmp 
2017-05-27 01:02:08,072 [myid:] - INFO [main:Environment@100] - Server environment:java.compiler=<NA> 
2017-05-27 01:02:08,072 [myid:] - INFO [main:Environment@100] - Server environment:os.name=Linux 
2017-05-27 01:02:08,072 [myid:] - INFO [main:Environment@100] - Server environment:os.arch=amd64 
2017-05-27 01:02:08,073 [myid:] - INFO [main:Environment@100] - Server environment:os.version=3.10.0-514.16.1.el7.x86_64 
2017-05-27 01:02:08,073 [myid:] - INFO [main:Environment@100] - Server environment:user.name=zookeeper 
2017-05-27 01:02:08,073 [myid:] - INFO [main:Environment@100] - Server environment:user.home=/opt/zookeeper 
2017-05-27 01:02:08,073 [myid:] - INFO [main:Environment@100] - Server environment:user.dir=/ 
2017-05-27 01:02:08,074 [myid:] - INFO [main:ZooKeeperServer@829] - tickTime set to 2000 
2017-05-27 01:02:08,074 [myid:] - INFO [main:ZooKeeperServer@838] - minSessionTimeout set to -1 
2017-05-27 01:02:08,074 [myid:] - INFO [main:ZooKeeperServer@847] - maxSessionTimeout set to -1 
2017-05-27 01:02:08,080 [myid:] - INFO [main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:2181 
2017-05-27 01:02:08,385 [myid:] - ERROR [main:Util@239] - Last transaction was partial. 
2017-05-27 01:02:08,400 [myid:] - ERROR [main:Util@239] - Last transaction was partial. 
2017-05-27 01:02:08,403 [myid:] - ERROR [main:Util@239] - Last transaction was partial. 
2017-05-27 01:02:08,403 [myid:] - ERROR [main:Util@239] - Last transaction was partial. 
2017-05-27 01:02:08,404 [myid:] - ERROR [main:Util@239] - Last transaction was partial. 
2017-05-27 01:02:08,404 [myid:] - ERROR [main:ZooKeeperServerMain@64] - Unexpected exception, exiting abnormally 
java.io.EOFException 
at java.io.DataInputStream.readInt(DataInputStream.java:392) 
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) 
at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) 
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:585) 
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:604) 
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:570) 
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:652) 
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:166) 
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) 
at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:283) 
at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:410) 
at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:118) 
at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:119) 
at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:87) 
at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:53) 
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) 
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
2017-05-27 01:02:08072[myid:]-INFO[main:Environment@100]-服务器环境:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2017-05-27 01:02:08072[myid:]-信息[main:Environment@100]-服务器环境:java.io.tmpdir=/tmp
2017-05-27 01:02:08072[myid:]-信息[main:Environment@100]-服务器环境:java.compiler=
2017-05-27 01:02:08072[myid:]-信息[main:Environment@100]-服务器环境:os.name=Linux
2017-05-27 01:02:08072[myid:]-信息[main:Environment@100]-服务器环境:os.arch=amd64
2017-05-27 01:02:08073[myid:]-信息[main:Environment@100]-服务器环境:os.version=3.10.0-514.16.1.el7.x86_64
2017-05-27 01:02:08073[myid:]-信息[main:Environment@100]-服务器环境:user.name=zookeeper
2017-05-27 01:02:08073[myid:]-信息[main:Environment@100]-服务器环境:user.home=/opt/zookeeper
2017-05-27 01:02:08073[myid:]-信息[main:Environment@100]-服务器环境:user.dir=/
2017-05-27 01:02:08074[myid:]-INFO[main:ZooKeeperServer@829]-时间设置为2000
2017-05-27 01:02:08074[myid:]-INFO[main:ZooKeeperServer@838]-minSessionTimeout设置为-1
2017-05-27 01:02:08074[myid:]-INFO[main:ZooKeeperServer@847]-maxSessionTimeout设置为-1
2017-05-27 01:02:08080[myid:]-INFO[main:NIOServerCnxnFactory@89]-绑定到端口0.0.0.0/0.0.0.0:2181
2017-05-27 01:02:08385[myid:]-错误[main:Util@239]-上次交易是部分交易。
2017-05-27 01:02:08400[myid:]-错误[main:Util@239]-上次交易是部分交易。
2017-05-27 01:02:08403[myid:]-错误[main:Util@239]-上次交易是部分交易。
2017-05-27 01:02:08403[myid:]-错误[main:Util@239]-上次交易是部分交易。
2017-05-27 01:02:08404[myid:]-错误[main:Util@239]-上次交易是部分交易。
2017-05-27 01:02:08404[myid:]-错误[main:ZooKeeperServerMain@64]-意外异常,异常退出
java.io.EOFException
位于java.io.DataInputStream.readInt(DataInputStream.java:392)
位于org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
位于org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
位于org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:585)
位于org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:604)
位于org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:570)
位于org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:652)
位于org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:166)
位于org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
位于org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:283)
位于org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:410)
位于org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:118)
位于org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:119)
位于org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:87)
位于org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:53)
位于org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
位于org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
谢谢
IPVP

看起来您遇到了一个已知的Apache ZooKeeper错误。与此相关的ApacheJira有几个不同的问题:和。如果您对根本原因分析和一些潜在的修复方案感兴趣,请参阅这些问题中的注释

不幸的是,目前还没有一个Apache ZooKeeper版本包含对该错误的修复。您可以尝试以下几种可能的解决方法:

  • 创建您自己的ZooKeeper本地版本,其中一个补丁附加到应用的链接JIRA问题。请注意,这些补丁尚未被ZooKeeper社区接受,因此使用风险自负
  • 删除有问题的日志文件。问题的根本原因是以前运行ZooKeeper时写入的日志文件的头不完整。由于头位于文件的开头,并且头本身不完整,因此我们可以假设在该点之后日志文件中没有事务数据。因此,在不造成任何数据丢失的情况下进行删除应该是安全的
  • 如果比较容易,你可以考虑重新格式化这个动物园管理员集群。如果ZooKeeper安装中的所有数据都是短暂的,并且不需要长期持久性,那么这可能是一个合适的解决方案
    我的解决方案是在/hadoop/zookeeper/version-2(或dataDir所在的任何位置)中找到0长度的日志文件并将其删除。
    之后启动ZooKeeper。

    我的解决方案是找到最后一个日志文件(长度为0字节)

    您可以在
    version-2
    目录中找到它

    ls -l -r --sort=time
    
    -rw-r--r-- 1 chris chris  67108880 Jan 24 10:37 log.23c6a70
    -rw-r--r-- 1 chris chris         0 Jan 24 10:37 log.23d3fb4
    
    我已经试着先删除快照和最后2个日志文件,它们也在工作,但是你会有一个“稍微”旧一点的版本

    为了安全起见,您可能必须同时删除最后一个快照文件和最后一个日志文件以及长度为0的日志文件

    顺便说一句,日志文件和快照具有相同的十六进制模式,必须匹配

    日志。23c6a70

    快照。23c6a6e


    它们必须匹配并保持一致,您应该修复此问题

    当我收到此消息时,包含我的数据目录的分区已满,因此我
    -rw-r--r-- 1 chris chris  3685904 Jan 24 00:56 snapshot.23c6a6e