Hadoop HDFS:上传后文件未分发

Hadoop HDFS:上传后文件未分发,hadoop,hdfs,Hadoop,Hdfs,我已经在8节点集群上部署了hadoop(0.20.203.0rc1)。将文件上传到hdfs后,我只在其中一个节点上获得了该文件,而不是在所有节点上均匀分布。问题是什么 $HADOOP_HOME/bin/hadoop dfs -copyFromLocal ../data/rmat-20.0 /user/frolo/input/rmat-20.0 $HADOOP_HOME/bin/hadoop dfs -stat "%b %o %r %n" /user/frolo/input/rmat-* 122

我已经在8节点集群上部署了hadoop(0.20.203.0rc1)。将文件上传到hdfs后,我只在其中一个节点上获得了该文件,而不是在所有节点上均匀分布。问题是什么

$HADOOP_HOME/bin/hadoop dfs -copyFromLocal ../data/rmat-20.0 /user/frolo/input/rmat-20.0

$HADOOP_HOME/bin/hadoop dfs -stat "%b %o %r %n" /user/frolo/input/rmat-*
1220222968 67108864 1 rmat-20.0

$HADOOP_HOME/bin/hadoop dfsadmin -report 
Configured Capacity: 2536563998720 (2.31 TB)
Present Capacity: 1642543419392 (1.49 TB)
DFS Remaining: 1641312030720 (1.49 TB)
DFS Used: 1231388672 (1.15 GB)
DFS Used%: 0.07%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 8 (8 total, 0 dead)

Name: 10.10.1.15:50010
Decommission Status : Normal
Configured Capacity: 317070499840 (295.29 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 131536928768 (122.5 GB)
DFS Remaining: 185533546496(172.79 GB)
DFS Used%: 0%
DFS Remaining%: 58.51%
Last contact: Fri Feb 07 12:10:27 MSK 2014


Name: 10.10.1.13:50010
Decommission Status : Normal
Configured Capacity: 317070499840 (295.29 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 131533377536 (122.5 GB)
DFS Remaining: 185537097728(172.79 GB)
DFS Used%: 0%
DFS Remaining%: 58.52%
Last contact: Fri Feb 07 12:10:27 MSK 2014


Name: 10.10.1.17:50010
Decommission Status : Normal
Configured Capacity: 317070499840 (295.29 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 120023924736 (111.78 GB)
DFS Remaining: 197046550528(183.51 GB)
DFS Used%: 0%
DFS Remaining%: 62.15%
Last contact: Fri Feb 07 12:10:27 MSK 2014


Name: 10.10.1.18:50010
Decommission Status : Normal
Configured Capacity: 317070499840 (295.29 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 78510628864 (73.12 GB)
DFS Remaining: 238559846400(222.18 GB)
DFS Used%: 0%
DFS Remaining%: 75.24%
Last contact: Fri Feb 07 12:10:24 MSK 2014


Name: 10.10.1.14:50010
Decommission Status : Normal
Configured Capacity: 317070499840 (295.29 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 131537530880 (122.5 GB)
DFS Remaining: 185532944384(172.79 GB)
DFS Used%: 0%
DFS Remaining%: 58.51%
Last contact: Fri Feb 07 12:10:27 MSK 2014


Name: 10.10.1.11:50010
Decommission Status : Normal
Configured Capacity: 317070499840 (295.29 GB)
DFS Used: 1231216640 (1.15 GB)
Non DFS Used: 84698116096 (78.88 GB)
DFS Remaining: 231141167104(215.27 GB)
DFS Used%: 0.39%
DFS Remaining%: 72.9%
Last contact: Fri Feb 07 12:10:24 MSK 2014


Name: 10.10.1.16:50010
Decommission Status : Normal
Configured Capacity: 317070499840 (295.29 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 131537494016 (122.5 GB)
DFS Remaining: 185532981248(172.79 GB)
DFS Used%: 0%
DFS Remaining%: 58.51%
Last contact: Fri Feb 07 12:10:27 MSK 2014


Name: 10.10.1.12:50010
Decommission Status : Normal
Configured Capacity: 317070499840 (295.29 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 84642578432 (78.83 GB)
DFS Remaining: 232427896832(216.47 GB)
DFS Used%: 0%
DFS Remaining%: 73.3%
Last contact: Fri Feb 07 12:10:27 MSK 2014

您的文件是以
1
的复制系数编写的,您的
hadoop fs-stat
命令输出证明了这一点。这意味着文件下的块将只存在一个块副本

写入的默认复制因子由
$HADOOP\u HOME/conf/hdfs site.xml
下的属性
dfs.replication
控制。如果在其下未指定,则默认值为
3
,但很可能指定了其值为
1
的覆盖。将其值更改回
3
或将其全部删除(以调用默认值)将使所有新文件写入在默认情况下使用
3
副本

您还可以使用hadoop fs实用程序支持的
-D
属性传递方法,在每个write命令中传递特定的复制因子,例如:


hadoop fs-Ddfs.replication=3-copyFromLocal../data/rmat-20.0/user/frolo/input/rmat-20.0

您还可以使用hadoop fs-setrep实用程序更改现有文件的复制因子,例如:


hadoop fs-setrep 3-w/user/frolo/input/rmat-20.0


如果文件的
HDFS
复制因子大于
1
,则会显示为自动分布在多个节点上
HDFS
永远不会在同一个
DataNode

上写入多个块的副本,我刚刚使用相同的命令上传了另一个文件:$HADOOP_HOME/bin/HADOOP dfs-copyFromLocal../data/rmat-20.0/user/frolo/input/rmat-20.0-2,它也已加载到10.10.1.11节点,顺便问一下,这是我运行命令的节点(主节点)。您的复制系数是多少?HDFS数据可能并不总是统一放置在DataNode上。如果您主要关心的是单个节点上的所有数据,并且如果您正在寻找跨节点强制平衡数据的方法(无论复制值是多少),那么一个简单的选项是$HADOOP_HOME/bin/start-balancer.sh,它将运行一个平衡过程来自动在集群中移动块