Apache spark 纱线&x2B；SPARK&x2B；无法替换现有管道上的坏数据节点，因为没有更多的好数据节点可供尝试_Apache Spark_Hdfs

Apache spark 纱线&x2B；SPARK&x2B；无法替换现有管道上的坏数据节点，因为没有更多的好数据节点可供尝试

apache-spark

Apache spark 纱线&x2B；SPARK&x2B；无法替换现有管道上的坏数据节点，因为没有更多的好数据节点可供尝试,apache-spark,hdfs,Apache Spark,Hdfs,我有一个6个数据节点的小型计算机群集，在运行Spark作业时遇到完全故障失败的： ERROR [SparkListenerBus][driver][] [org.apache.spark.scheduler.LiveListenerBus] Listener EventLoggingListener threw an exception java.io.IOException: Failed to replace a bad datanode on the existing pipeline

我有一个6个数据节点的小型计算机群集，在运行Spark作业时遇到完全故障

失败的：

ERROR [SparkListenerBus][driver][] [org.apache.spark.scheduler.LiveListenerBus] Listener EventLoggingListener threw an exception
java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[42.3.44.157:50010,DS-87cdbf42-3995-4313-8fab-2bf6877695f6,DISK], DatanodeInfoWithStorage[42.3.44.154:50010,DS-60eb1276-11cc-4cb8-a844-f7f722de0e15,DISK]], original=[DatanodeInfoWithStorage[42.3.44.157:50010,DS-87cdbf42-3995-4313-8fab-2bf6877695f6,DISK], DatanodeInfoWithStorage[42.3.44.154:50010,DS-60eb1276-11cc-4cb8-a844-f7f722de0e15,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:1059)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1122)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1280)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1005)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:512)
---T08:18:07.007 ERROR [SparkListenerBus][driver][] [STATISTICS] [onQueryTerminated] queryId:

通过在HDFS配置中设置以下值，我找到了以下解决方法

dfs.client.block.write.replace-datanode-on-failure.enable=true
dfs.client.block.write.replace-datanode-on-failure.policy=NEVER

这两个属性

dfs.client.block.write.replace datanode on failure.policy

和

dfs.client.block.write.replace-data node on failure.enable

影响管道恢复的客户端行为，这些属性可以作为自定义属性添加到“hdfs站点”配置中

设置这些参数值是否是一个好的解决方案

dfs.client.block.write.replace-datanode-on-failure.enable   true    If there is a datanode/network failure in the write pipeline, DFSClient will try to remove the failed datanode from the pipeline and then continue writing with the remaining datanodes. As a result, the number of datanodes in the pipeline is decreased. The feature is to add new datanodes to the pipeline. This is a site-wide property to enable/disable the feature. When the cluster size is extremely small, e.g. 3 nodes or less, cluster administrators may want to set the policy to NEVER in the default configuration file or disable this feature. Otherwise, users may experience an unusually high rate of pipeline failures since it is impossible to find new datanodes for replacement. See also dfs.client.block.write.replace-datanode-on-failure.policy
dfs.client.block.write.replace-datanode-on-failure.policy   DEFAULT This property is used only if the value of dfs.client.block.write.replace-datanode-on-failure.enable is true. ALWAYS: always add a new datanode when an existing datanode is removed. NEVER: never add a new datanode. DEFAULT: Let r be the replication number. Let n be the number of existing datanodes. Add a new datanode only if r is greater than or equal to 3 and either (1) floor(r/2) is greater than or equal to n; or (2) r is greater than n and the block is hflushed/appended.