为什么ActiveMQ群集会因“失败”而失败;服务器空";Zookeeper主节点何时脱机?

为什么ActiveMQ群集会因“失败”而失败;服务器空";Zookeeper主节点何时脱机?,activemq,apache-zookeeper,leveldb,Activemq,Apache Zookeeper,Leveldb,我在ActiveMQ中遇到了一个问题,当主Zookeeper节点脱机时,整个集群将失败 我们在开发环境中有一个3节点的ActiveMQ集群设置。每个节点都有ActiveMQ 5.12.0和Zookeeper 3.4.6(*注意,我们已经使用Zookeeper 3.4.7进行了一些测试,但这并没有解决问题。到目前为止,时间限制使我们无法测试ActiveMQ 5.13) 我们发现,当我们停止主ZooKeeper进程(通过任务管理器中的“结束进程树”命令)时,其余两个ZooKeeper节点继续正常工作

我在ActiveMQ中遇到了一个问题,当主Zookeeper节点脱机时,整个集群将失败

我们在开发环境中有一个3节点的ActiveMQ集群设置。每个节点都有ActiveMQ 5.12.0和Zookeeper 3.4.6(*注意,我们已经使用Zookeeper 3.4.7进行了一些测试,但这并没有解决问题。到目前为止,时间限制使我们无法测试ActiveMQ 5.13)

我们发现,当我们停止主ZooKeeper进程(通过任务管理器中的“结束进程树”命令)时,其余两个ZooKeeper节点继续正常工作。有时ActiveMQ集群能够处理这个问题,但有时却不能

当集群失败时,我们通常会在ActiveMQ日志中看到:

2015-12-18 09:08:45,157 | WARN  | Too many cluster members are connected.  Expected at most 3 members but there are 4 connected. | org.apache.activemq.leveldb.replicated.MasterElector | WrapperSimpleAppMain-EventThread
...
...
2015-12-18 09:27:09,722 | WARN  | Session 0x351b43b4a560016 for server null, unexpected error, closing socket connection and attempting reconnect | org.apache.zookeeper.ClientCnxn | WrapperSimpleAppMain-SendThread(192.168.0.10:2181)
java.net.ConnectException: Connection refused: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)[:1.7.0_79]
at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)[:1.7.0_79]
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)[zookeeper-3.4.6.jar:3.4.6-1569965]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)[zookeeper-3.4.6.jar:3.4.6-1569965]
我们立即感到担忧的是:(A)ActiveMQ似乎认为集群中只有四个成员,而它只配置了3个;(B)当引发异常时,服务器似乎为空。然后,我们将ActiveMQ的日志记录级别提高到DEBUG,以便显示成员列表:

2015-12-18 09:33:04,236 | DEBUG | ZooKeeper group changed: Map(localhost -> ListBuffer((0000000156,{"id":"localhost","container":null,"address":null,"position":-1,"weight":5,"elected":null}), (0000000157,{"id":"localhost","container":null,"address":null,"position":-1,"weight":1,"elected":null}), (0000000158,{"id":"localhost","container":null,"address":"tcp://192.168.0.11:61619","position":-1,"weight":10,"elected":null}), (0000000159,{"id":"localhost","container":null,"address":null,"position":-1,"weight":10,"elected":null}))) | org.apache.activemq.leveldb.replicated.MasterElector | ActiveMQ BrokerService[localhost] Task-14
有人能提出为什么会发生这种情况和/或提出解决办法吗?我们的配置如下所示:

动物园管理员:

tickTime=2000
dataDir=C:\\zookeeper-3.4.7\\data
clientPort=2181
initLimit=5
syncLimit=2
server.1=192.168.0.10:2888:3888
server.2=192.168.0.11:2888:3888
server.3=192.168.0.12:2888:3888
<persistenceAdapter>    
    <replicatedLevelDB
    directory="activemq-data"
    replicas="3"
    bind="tcp://0.0.0.0:61619"
    zkAddress="192.168.0.11:2181,192.168.0.10:2181,192.168.0.12:2181"
    zkPath="/activemq/leveldb-stores"
    hostname="192.168.0.10"
    weight="5"/>
    //server.2 has a weight of 10, server.3 has a weight of 1
</persistenceAdapter>
ActiveMQ(服务器1):

tickTime=2000
dataDir=C:\\zookeeper-3.4.7\\data
clientPort=2181
initLimit=5
syncLimit=2
server.1=192.168.0.10:2888:3888
server.2=192.168.0.11:2888:3888
server.3=192.168.0.12:2888:3888
<persistenceAdapter>    
    <replicatedLevelDB
    directory="activemq-data"
    replicas="3"
    bind="tcp://0.0.0.0:61619"
    zkAddress="192.168.0.11:2181,192.168.0.10:2181,192.168.0.12:2181"
    zkPath="/activemq/leveldb-stores"
    hostname="192.168.0.10"
    weight="5"/>
    //server.2 has a weight of 10, server.3 has a weight of 1
</persistenceAdapter>

//服务器.2的权重为10,服务器.3的权重为1