Solr zookeeper在重新启动后未运行

Solr zookeeper在重新启动后未运行,solr,apache-zookeeper,Solr,Apache Zookeeper,我有3个zookeeper节点。这些节点工作正常,但当我使用./zkServer.sh restart重新启动这些节点时,zookeeper没有再次启动 当我检查zookeeper状态时,它返回: ./zkServer.sh status JMX enabled by default Using config: /opt/zookeeper/bin/../conf/zoo.cfg Error contacting service. It is probably not running. my

我有3个zookeeper节点。这些节点工作正常,但当我使用./zkServer.sh restart重新启动这些节点时,zookeeper没有再次启动

当我检查zookeeper状态时,它返回:

./zkServer.sh status
JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.
my zoo.cnf是:

dataDir=/var/lib/zookeeperdata/3
clientPort=2181
initLimit=50
tickTime=2000
syncLimit=10
maxClientCnxns=100000
server.1=IP1 value:2888:3888
server.2=IP2 value:2889:3889
server.3=127.0.0.1:2890:3890
这是一种不稳定的行为,因为可能在两个小时后或明天,如果我重新启动3个zookeeper节点,它们将看到彼此并正常工作,因为这在我之前发生过

动物园管理员日志:

2014-05-14 15:22:34,236 [myid:3] - INFO  [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:2181
2014-05-14 15:22:34,282 [myid:3] - INFO  [main:QuorumPeer@913] - tickTime set to 2000
2014-05-14 15:22:34,283 [myid:3] - INFO  [main:QuorumPeer@933] - minSessionTimeout set to -1
2014-05-14 15:22:34,283 [myid:3] - INFO  [main:QuorumPeer@944] - maxSessionTimeout set to -1
2014-05-14 15:22:34,283 [myid:3] - INFO  [main:QuorumPeer@959] - initLimit set to 50
2014-05-14 15:22:34,356 [myid:3] - INFO  [main:FileSnap@83] - Reading snapshot /var/lib/zookeeperdata/3/version-2/snapshot.f100000001
2014-05-14 15:22:43,387 [myid:3] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:50923
2014-05-14 15:22:43,396 [myid:3] - INFO  [Thread-1:QuorumCnxManager$Listener@486] - My election bind port: 0.0.0.0/0.0.0.0:3890
2014-05-14 15:22:43,404 [myid:3] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOExce
ption: ZooKeeperServer not running
2014-05-14 15:22:43,404 [myid:3] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /127.0.0.1:50923 (no se
ssion established for client)
2014-05-14 15:22:43,427 [myid:3] - INFO  [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:QuorumPeer@670] - LOOKING
2014-05-14 15:22:43,429 [myid:3] - INFO  [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@740] - New election. My id =  3, proposed zxid=0xf100000001
2014-05-14 15:22:48,438 [myid:3] - WARN  [WorkerSender[myid=3]:QuorumCnxManager@368] - Cannot open channel to 1 at election address /54.76.10.81:3888
java.net.SocketTimeoutException: connect timed out
  at java.net.PlainSocketImpl.socketConnect(Native Method)
  at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
  at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
  at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
  at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
  at java.net.Socket.connect(Socket.java:529)
  at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)
  at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:327)
  at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:393)
  at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:365)
  at java.lang.Thread.run(Thread.java:662)
2014-05-14 15:22:53,440 [myid:3] - WARN  [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@368] - Cannot open channel to 1 at election address /54.76.10.81:3
888
java.net.SocketTimeoutException: connect timed out
  at java.net.PlainSocketImpl.socketConnect(Native Method)
  at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
  at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
  at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
  at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
  at java.net.Socket.connect(Socket.java:529)
  at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)
  at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:388)
我在这方面搜索了很多,但我没有找到任何对我有用的东西,所以我希望有人能帮助我


谢谢

我也见过这样的行为。一直运行良好的ZK配置有时会无法重新启动。发生这种情况时,我尝试了以下方法:

1) 查看所有服务器的日志…通常一个服务器会列出一个错误 2) 停止所有服务器并重新启动 3) 停止所有服务器,然后一次重新启动一个服务器 4) 验证每个服务器的myid文件是否存在、是否具有正确的权限以及是否具有正确的值

我使用clusterssh向每个服务器打开窗口,以便可以同时重新启动……然后我跟踪了所有服务器日志。请记住,在重新启动期间,ZK集群做了很多事情:启动每台服务器和选择一位领导者。我曾经有过集群出现故障的时候,过了几分钟,它似乎就明白了


有一个很棒的工具叫zktop,我用来监视ZK。

我通过将IP 127.0.0.1更改为amazon节点的内部IP来修复它,在对三个节点进行此更改并重新启动后,此问题不再发生。我希望这个答案可以帮助询问相同问题的人。

确保在每个节点配置中都放置了正确的数据目录。 并将myid文件放在data Dir中,并在myid文件中为每个节点放置一个介于1-255之间的数字。 我认为它解决了这个问题