Apache zookeeper 若zookeeper leader进程被终止,所有跟随者是否也应该获得异常并重新启动?

Apache zookeeper 若zookeeper leader进程被终止,所有跟随者是否也应该获得异常并重新启动?,apache-zookeeper,Apache Zookeeper,我正在使用Zookeeper 3.4.6进行一个项目,正在执行一些故障模式测试。在这样做的过程中,我发现了(我认为是)意想不到的行为 如果领导者Zookeeper进程被终止,追随者是否应该重新启动 环境: OS: Windows Server 2008 R2 (hosted in a Tanuki Java service wrapper) Zookeeper: 3.4.6 Java JDK: 1.7.0.210 测试: OS: Windows Server 20

我正在使用Zookeeper 3.4.6进行一个项目,正在执行一些故障模式测试。在这样做的过程中,我发现了(我认为是)意想不到的行为

如果领导者Zookeeper进程被终止,追随者是否应该重新启动

环境:

OS:        Windows Server 2008 R2 (hosted in a Tanuki Java service wrapper)
Zookeeper: 3.4.6
Java JDK:  1.7.0.210
测试:

OS:        Windows Server 2008 R2 (hosted in a Tanuki Java service wrapper)
Zookeeper: 3.4.6
Java JDK:  1.7.0.210
测试是杀死Zookeeper进程并确保集群恢复

如果我杀死一个非leader进程,它将重新启动并重新加入集群,而不会影响其他节点

如果我杀死领导者进程,领导者和追随者将重新启动。这似乎是不对的,因为有一段时间客户端无法连接到任何Zookeeper节点

我尝试了TCPUDP通信设置,但两者表现出相同的行为。不过,UDP的恢复速度要快一倍

Zookeeper设置

tickTime=2000
initLimit=5
syncLimit=2
minSessionTimeout=5000
maxSessionTimeout=120000
dataDir=C:\\ProgramData\\Saab OneView\\ZooKeeper\\zoo-data
clientPort=2181
leaderServes=yes
autopurge.purgeInterval=24

# IP addresses blanked out here
server.1=0.0.0.1:2888:3888
server.2=0.0.0.2:2888:3888
server.3=0.0.0.3:2888:3888
server.4=0.0.0.4:2888:3888
server.5=0.0.0.5:2888:3888

# This is for zookeeper->zookeeper communication
# I've tried both settings, UDP has faster recovery time
# 0 = UDP 
# 3 = TCP (default)
electionAlg=3
导致关机的样本跟随器异常

20160309 05:35:51.958Z 20160309 05:35:51.958 [myid:3] - WARN  [RecvWorker:4:QuorumCnxManager$RecvWorker@780] - Connection broken for id 4, my id = 3, error = 
java.net.SocketException: Connection reset
    at java.net.SocketInputStream.read(Unknown Source)
    at java.net.SocketInputStream.read(Unknown Source)
    at java.net.SocketInputStream.read(Unknown Source)
    at java.io.DataInputStream.readInt(Unknown Source)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765)
20160309 05:35:51.959Z 20160309 05:35:51.959 [myid:3] - WARN  [RecvWorker:4:QuorumCnxManager$RecvWorker@783] - Interrupting SendWorker
20160309 05:35:51.959Z 20160309 05:35:51.959 [myid:3] - WARN  [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when following the leader
java.net.SocketException: Connection reset
    at java.net.SocketInputStream.read(Unknown Source)
    at java.net.SocketInputStream.read(Unknown Source)
    at java.io.BufferedInputStream.fill(Unknown Source)
    at java.io.BufferedInputStream.read(Unknown Source)
    at java.io.DataInputStream.readInt(Unknown Source)
    at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
    at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
    at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
    at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
    at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
    at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
20160309 05:35:51.960Z 20160309 05:35:51.960 [myid:3] - INFO  [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
java.lang.Exception: shutdown Follower
    at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
    at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:790)