Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/hadoop/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Hadoop HA Namenode故障切换因杀死整个服务器而失败_Hadoop_Hdfs_Apache Zookeeper_High Availability_Failover - Fatal编程技术网

Hadoop HA Namenode故障切换因杀死整个服务器而失败

Hadoop HA Namenode故障切换因杀死整个服务器而失败,hadoop,hdfs,apache-zookeeper,high-availability,failover,Hadoop,Hdfs,Apache Zookeeper,High Availability,Failover,我正试图建立一个HA Hdfs集群,就像ApacheDoku解释的那样。 当我关闭der active Nameserver的Prozess时,故障切换工作正常,但当我导致更大的故障(如拔掉网络)时,备用namenode不会变为活动状态。 备用节点尝试将活动节点设置为备用,但当主机死机时,将很难建立SSH连接。 有什么我没见过的吗 这是hadoop-hduser-zkfc-nn1.log所说的: 2016-09-27 15:03:11,316 INFO org.apache.hadoop.ha

我正试图建立一个HA Hdfs集群,就像ApacheDoku解释的那样。

当我关闭der active Nameserver的Prozess时,故障切换工作正常,但当我导致更大的故障(如拔掉网络)时,备用namenode不会变为活动状态。 备用节点尝试将活动节点设置为备用,但当主机死机时,将很难建立SSH连接。 有什么我没见过的吗

这是hadoop-hduser-zkfc-nn1.log所说的:

2016-09-27 15:03:11,316 INFO org.apache.hadoop.ha.NodeFencer: ====== Beginning Service Fencing Process... ======
2016-09-27 15:03:20,316 INFO org.apache.hadoop.ha.NodeFencer: Trying method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
2016-09-27 15:03:20,317 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connecting to nn2.example.org...
2016-09-27 15:03:20,317 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to nn2.example.org port 22
2016-09-27 15:03:23,315 WARN org.apache.hadoop.ha.SshFenceByTcpPort: Unable to connect to nn2.example.org as user hduser
com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to host
        at com.jcraft.jsch.Util.createSocket(Util.java:386)
        at com.jcraft.jsch.Session.connect(Session.java:182)
        at org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
        at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
        at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:532)
        at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:505)
        at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)
        at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:892)
        at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:910)
        at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:809)
        at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2016-09-27 15:03:23,315 WARN org.apache.hadoop.ha.NodeFencer: Fencing method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
2016-09-27 15:03:23,315 ERROR org.apache.hadoop.ha.NodeFencer: Unable to fence service by any configured method.
2016-09-27 15:03:23,315 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election
java.lang.RuntimeException: Unable to fence NameNode at nn2.example.org/172.16.1.188:8040
        at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:533)
        at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:505)
        at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)
        at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:892)
        at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:910)
        at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:809)
        at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2016-09-27 15:03:23,316 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session
2016-09-27 15:03:23,325 INFO org.apache.zookeeper.ZooKeeper: Session: 0x3576b5e84d4003a closed
2016-09-27 15:03:24,329 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=nn1.example.org:2181,nn2.example.org:2181,dn1.example.org:2181,dn2.example.org:2181,dn3.example.org:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3bf4eda6
2016-09-27 15:03:24,335 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server nn1.example.org/172.16.1.187:2181. Will not attempt to authenticate using SASL (unknown error)
2016-09-27 15:03:24,342 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to nn1.example.org/172.16.1.187:2181, initiating session
2016-09-27 15:03:24,379 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server nn1.example.org/172.16.1.187:2181, sessionid = 0x1576b95292f0001, negotiated timeout = 5000
2016-09-27 15:03:24,386 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2016-09-27 15:03:24,403 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected.
2016-09-27 15:03:24,407 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced...
2016-09-27 15:03:24,425 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old node exists: 0a0a68612d636c757374657212036e6e321a1668646d617374657230322e7265696368656c742e646520e83e28d33e
2016-09-27 15:03:24,444 INFO org.apache.hadoop.ha.ZKFailoverController: Should fence: NameNode at nn2.example.org/172.16.1.188:8040
2016-09-27 15:03:27,315 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: nn2.example.org/172.16.1.188:8040. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
2016-09-27 15:03:29,315 WARN org.apache.hadoop.ha.FailoverController: Unable to gracefully make NameNode at nn2.example.org/172.16.1.188:8040 standby (unable to connect)
java.net.NoRouteToHostException: No Route to Host from  nn1.example.org/172.16.1.187 to nn2.example.org:8040 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see:  http://wiki.apache.org/hadoop/NoRouteToHost
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:758)
        at org.apache.hadoop.ipc.Client.call(Client.java:1479)
        at org.apache.hadoop.ipc.Client.call(Client.java:1412)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
        at com.sun.proxy.$Proxy9.transitionToStandby(Unknown Source)
        at org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.transitionToStandby(HAServiceProtocolClientSideTranslatorPB.java:112)
        at org.apache.hadoop.ha.FailoverController.tryGracefulFence(FailoverController.java:172)
        at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:514)
        at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:505)
        at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)
        at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:892)
        at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:910)
        at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:809)
        at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: java.net.NoRouteToHostException: No route to host
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
        at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
        at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
        at org.apache.hadoop.ipc.Client.call(Client.java:1451)
        ... 14 more

出于好奇问了三个问题:1。您是否只拔下了本应出现故障的节点的网线?2.hdfs真的坏了吗,还是你刚刚得到了这些错误?3.您确定失败节点的所有内容都是冗余的吗(例如,不是唯一一个保存元存储的内容?)我有5个节点。2个主人和3个奴隶。主站是nn1和nn2。1.当然,我只拔下了活动NN的网络插头以检查故障切换。2.hdfs工作正常,但仅在只读模式下工作,因为没有活动NN 3。是的。顺便说一句,当我重新连接拔出服务器的网络连接时,被动namenode可以通过ssh连接,并执行故障切换并切换到主动。很奇怪,我也有同样的问题。但是当我停止活动的namenode并等待故障转移发生时,出现了这个问题。故障转移没有发生。有人能帮忙吗?@Krishnom你能帮忙吗?面对同样的问题