ConnectionLossException:keeperrorCode=ConnectionLoss for/hbase/master

ConnectionLossException:keeperrorCode=ConnectionLoss for/hbase/master,hbase,apache-zookeeper,Hbase,Apache Zookeeper,我在Cloudera管理的AWS上有一个hbase 0.96.1.1-cdh5.0.2群集,有4个区域服务器和1个zookeeper服务器。zookeeper服务器与hbase主机在同一主机上运行。我面临的问题是,3/4区域服务器关闭,因为它们无法连接到zookeeper。唯一保持运行的区域服务器是与主服务器和zookeeper运行在同一主机上的服务器。下面是其中一个故障区域服务器日志的相关部分 2014-11-14 15:46:59,871 INFO org.apache.zookeeper.

我在Cloudera管理的AWS上有一个hbase 0.96.1.1-cdh5.0.2群集,有4个区域服务器和1个zookeeper服务器。zookeeper服务器与hbase主机在同一主机上运行。我面临的问题是,3/4区域服务器关闭,因为它们无法连接到zookeeper。唯一保持运行的区域服务器是与主服务器和zookeeper运行在同一主机上的服务器。下面是其中一个故障区域服务器日志的相关部分

2014-11-14 15:46:59,871 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection,  connectString=ip-10-146-188-157.ec2.internal:2181 sessionTimeout=60000 watcher=regionserver:60020,     quorum=ip-10-146-188-157.ec2.internal:2181, baseZNode=/hbase
2014-11-14 15:46:59,915 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process  identifier=regionserver:60020 connecting to ZooKeeper ensemble=ip-10-146-188-157.ec2.internal:2181
2014-11-14 15:46:59,920 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ip-10-146-188-157.ec2.internal/10.146.188.157:2181. Will not attempt to authenticate using SASL (unknown error)
2014-11-14 15:47:00,649 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Installed shutdown hook thread: Shutdownhook:regionserver60020
2014-11-14 15:47:59,948 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 60041ms for sessionid 0x0, closing socket connection and attempting reconnect
2014-11-14 15:48:00,067 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=ip-10-146-188-157.ec2.internal:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
2014-11-14 15:48:00,072 INFO org.apache.hadoop.hbase.util.RetryCounter: Sleeping 1000ms before retry #0...
2014-11-14 15:48:01,067 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ip-10-146-188-157.ec2.internal/10.146.188.157:2181. Will not attempt to authenticate using SASL (unknown error)
2014-11-14 15:49:00,123 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 60057ms for sessionid 0x0, closing socket connection and attempting reconnect
2014-11-14 15:49:00,224 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=ip-10-146-188-157.ec2.internal:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
2014-11-14 15:49:00,224 INFO org.apache.hadoop.hbase.util.RetryCounter: Sleeping 2000ms before retry #1...
2014-11-14 15:49:01,224 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ip-10-146-188-157.ec2.internal/10.146.188.157:2181. Will not attempt to authenticate using SASL (unknown error)
2014-11-14 15:50:00,259 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 60035ms for sessionid 0x0, closing socket connection and attempting reconnect
2014-11-14 15:50:00,360 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=ip-10-146-188-157.ec2.internal:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
2014-11-14 15:50:00,360 INFO org.apache.hadoop.hbase.util.RetryCounter: Sleeping 4000ms before retry #2...
2014-11-14 15:50:01,360 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ip-10-146-188-157.ec2.internal/10.146.188.157:2181. Will not attempt to authenticate using SASL (unknown error)
2014-11-14 15:51:00,408 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 60048ms for sessionid 0x0, closing socket connection and attempting reconnect
2014-11-14 15:51:00,509 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=ip-10-146-188-157.ec2.internal:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
2014-11-14 15:51:00,509 INFO org.apache.hadoop.hbase.util.RetryCounter: Sleeping 8000ms before retry #3...
2014-11-14 15:51:01,509 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ip-10-146-188-157.ec2.internal/10.146.188.157:2181. Will not attempt to authenticate using SASL (unknown error)
2014-11-14 15:52:00,559 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 60051ms for sessionid 0x0, closing socket connection and attempting reconnect
2014-11-14 15:52:00,659 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=ip-10-146-188-157.ec2.internal:2181,  exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode =  ConnectionLoss for /hbase/master
2014-11-14 15:52:00,660 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 4 attempts
2014-11-14 15:52:00,661 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: regionserver:60020,   quorum=ip-10-146-188-157.ec2.internal:2181, baseZNode=/hbase Unable to set watcher on znode  /hbase/master
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss  for  /hbase/master
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:199)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:425)
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:77)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:671)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:644)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:772)
    at java.lang.Thread.run(Thread.java:744)
2014-11-14 15:52:00,687 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher:   regionserver:60020, quorum=ip-10-146-188-157.ec2.internal:2181, baseZNode=/hbase Received unexpected   KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:199)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:425)
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:77)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:671)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:644)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:772)
    at java.lang.Thread.run(Thread.java:744)
2014-11-14 15:52:00,692 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 0.0.0.0,60020,1415998019646: Unexpected exception during initialization, aborting
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:199)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:425)
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:77)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:671)
    at     org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:644)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:772)
    at java.lang.Thread.run(Thread.java:744)
我怀疑这可能与/etc/hosts配置有关,但无法找出问题所在。群集中每个实例的/etc/hosts为:

127.0.0.1               localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6
处理zookeeper的hbase-site.xml部分是

<property>
  <name>zookeeper.znode.parent</name>
  <value>/hbase</value>
</property>
<property>
  <name>zookeeper.znode.rootserver</name>
  <value>root-region-server</value>
</property>
<property>
  <name>hbase.zookeeper.quorum</name>
  <value>ip-10-146-188-157.ec2.internal</value>
</property>
<property>
  <name>hbase.zookeeper.property.clientPort</name>
  <value>2181</value>
</property>

非常感谢您的帮助。

您是否已将FQDN提供给您的主机?如果没有,则给出它,并尝试使用配置文件中的FQDN更改相应的本地主机实例或ip

同时在这里发布您的zookeeper配置文件。