hadoop作为分布式模式时出错
我尝试使用hadoop作为分布式模式,并进行了设置,但出现了一个错误。 我将在下面描述设置过程: ①服务器构成 主服务器的主机名是Master,从服务器的名称是node1和node2。 所有服务器的操作系统都是CentOS7。 主设备的ip地址为131.113.101.103,从设备的ip地址为131.113.101.101和131.113.101.102 ②在每台服务器上设置 修复了/etc/hosts和/etc/hostname。我只描述主服务器。 ○/etc/主机名hadoop作为分布式模式时出错,hadoop,Hadoop,我尝试使用hadoop作为分布式模式,并进行了设置,但出现了一个错误。 我将在下面描述设置过程: ①服务器构成 主服务器的主机名是Master,从服务器的名称是node1和node2。 所有服务器的操作系统都是CentOS7。 主设备的ip地址为131.113.101.103,从设备的ip地址为131.113.101.101和131.113.101.102 ②在每台服务器上设置 修复了/etc/hosts和/etc/hostname。我只描述主服务器。 ○/etc/主机名 master ○/e
master
○/etc/主持人
131.113.101.101 node1
131.113.101.102 node2
131.113.101.103 master
已安装的软件包
sudo yum -y install epel-release
sudo yum -y install openssh-clients rsync wget java-1.8.0-openjdk-devel sshpass
获取hadoop
wget http://ftp.riken.jp/net/apache/hadoop/common/hadoop-2.8.1/hadoop-2.8.1.tar.gz
tar xf hadoop-2.8.1.tar.gz
修正了,bashrc
export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk
export HADOOP_HOME=~/hadoop-2.8.1
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$PATH
然后我检查了hadoop版本,它运行正常
③主服务器上的设置
无密码短语的ssh配置
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
并发送到node1和node2,并将名称更改为授权密钥。另外,我从master访问node1和node2进行检查,我可以访问而无需密码
○/etc/hadoop/slaves
node1
node2
○/etc/hadoop/core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://131.113.101.103:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop-username/</value>
</property>
启动守护进程
$HADOOP_HOME/sbin/start-dfs.sh
$HADOOP_HOME/sbin/start-yarn.sh
$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR start historyserver
然后我使用jps
命令检查每个服务器的进程
主服务器是
NameNode
Jps
ResourceManager
SecondaryNameNode
JobHistoryServer
节点服务器是
DataNode
Jps
NodeManager
然后我试着使用这个命令
$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.1.jar pi 10 10000
但这些错误代码已经返回
Number of Maps = 10
Samples per Map = 10000
17/10/25 03:00:16 WARN hdfs.DataStreamer: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/username/QuasiMonteCarlo_1508868015200_1006439027/in/part0 could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1733)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:265)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2496)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:828)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:506)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:845)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:788)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2455)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1481)
at org.apache.hadoop.ipc.Client.call(Client.java:1427)
at org.apache.hadoop.ipc.Client.call(Client.java:1337)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:440)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:335)
at com.sun.proxy.$Proxy11.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1733)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1536)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:658)
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/username/QuasiMonteCarlo_1508868015200_1006439027/in/part0 could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1733)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:265)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2496)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:828)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:506)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:845)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:788)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2455)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1481)
at org.apache.hadoop.ipc.Client.call(Client.java:1427)
at org.apache.hadoop.ipc.Client.call(Client.java:1337)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:440)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:335)
at com.sun.proxy.$Proxy11.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1733)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1536)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:658)
我寻找解决方案,但没有取得任何成果
----增加----
结果
bin/hadoop dfsadmin -report
是
似乎没有活动的数据节点
但在node1和node2上,从jps结果的一个方面来看,似乎datanode处理
并选中了/home/username/hadoop-2.8.1/logs/hadoop-username-datanode-node1.out
和home/username/hadoop-2.8.1/logs/hadoop-username-datanode-node2.out
结果如下:
○节点1
ulimit -a for user username
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 256944
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
○节点2
ulimit -a for user username
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 256944
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
我还检查了主服务器上的sudonetstat-ntlp
,结果和jps
结果如下:
○jps结果
17252 JobHistoryServer
16950 ResourceManager
17418 Jps
16508 NameNode
16701 SecondaryNameNode
12228 NodeManager
12045 DataNode
12493 Jps
○sudo netstat-ntlp结果
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 131.113.101.103:50090 0.0.0.0:* LISTEN 16701/java
tcp 0 0 0.0.0.0:19888 0.0.0.0:* LISTEN 17252/java
tcp 0 0 0.0.0.0:10033 0.0.0.0:* LISTEN 17252/java
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 16508/java
tcp 0 0 0.0.0.0:10020 0.0.0.0:* LISTEN 17252/java
tcp 0 0 131.113.101.103:9000 0.0.0.0:* LISTEN 16508/java
tcp6 0 0 131.113.101.103:8088 :::* LISTEN 16950/java
tcp6 0 0 131.113.101.103:8030 :::* LISTEN 16950/java
tcp6 0 0 131.113.101.103:8031 :::* LISTEN 16950/java
tcp6 0 0 131.113.101.103:8032 :::* LISTEN 16950/java
tcp6 0 0 131.113.101.103:8033 :::* LISTEN 16950/java
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:33742 0.0.0.0:* LISTEN 12045/java
tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 12045/java
tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 12045/java
tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 12045/java
tcp6 0 0 :::8042 :::* LISTEN 12228/java
tcp6 0 0 :::13562 :::* LISTEN 12228/java
tcp6 0 0 :::8040 :::* LISTEN 12228/java
tcp6 0 0 :::42633 :::* LISTEN 12228/java
在节点2上,结果如下:
○jps结果
17252 JobHistoryServer
16950 ResourceManager
17418 Jps
16508 NameNode
16701 SecondaryNameNode
12228 NodeManager
12045 DataNode
12493 Jps
○sudo netstat-ntlp结果
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 131.113.101.103:50090 0.0.0.0:* LISTEN 16701/java
tcp 0 0 0.0.0.0:19888 0.0.0.0:* LISTEN 17252/java
tcp 0 0 0.0.0.0:10033 0.0.0.0:* LISTEN 17252/java
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 16508/java
tcp 0 0 0.0.0.0:10020 0.0.0.0:* LISTEN 17252/java
tcp 0 0 131.113.101.103:9000 0.0.0.0:* LISTEN 16508/java
tcp6 0 0 131.113.101.103:8088 :::* LISTEN 16950/java
tcp6 0 0 131.113.101.103:8030 :::* LISTEN 16950/java
tcp6 0 0 131.113.101.103:8031 :::* LISTEN 16950/java
tcp6 0 0 131.113.101.103:8032 :::* LISTEN 16950/java
tcp6 0 0 131.113.101.103:8033 :::* LISTEN 16950/java
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:33742 0.0.0.0:* LISTEN 12045/java
tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 12045/java
tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 12045/java
tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 12045/java
tcp6 0 0 :::8042 :::* LISTEN 12228/java
tcp6 0 0 :::13562 :::* LISTEN 12228/java
tcp6 0 0 :::8040 :::* LISTEN 12228/java
tcp6 0 0 :::42633 :::* LISTEN 12228/java
有什么错误的观点吗
我觉得很奇怪,node2上没有本地地址“131.113.101.102”。错误堆栈跟踪显示datanodes没有运行。有关详细信息,请检查数据节点启动日志。除此之外,你可以看到你的问题是否与现在或将来的问题相似。还可以尝试运行下面的命令 从名称节点。虽然我在单机版上运行hadoop,但它应该会为您显示类似的信息,指示活动数据节点的数量
bin/hadoop dfsadmin -report
它应该为您提供有关活动节点的信息
Configured Capacity: 240611487744 (224.09 GB)
Present Capacity: 79048312831 (73.62 GB)
DFS Remaining: 79040917504 (73.61 GB)
DFS Used: 7395327 (7.05 MB)
DFS Used%: 0.01%
Under replicated blocks: 36
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (1):
Name: 127.0.0.1:50010 (127.0.0.1)
Hostname: HSNMM-Shailendra.com
Decommission Status : Normal
Configured Capacity: 240611487744 (224.09 GB)
DFS Used: 7395327 (7.05 MB)
Non DFS Used: 161563174913 (150.47 GB)
DFS Remaining: 79040917504 (73.61 GB)
DFS Used%: 0.00%
DFS Remaining%: 32.85%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Tue Oct 24 23:39:47 IST 2017
谢谢你的回复。我检查了datanodes,它似乎没有激活。但是,我不知道如何激活datanodes。启动时,datanode会尝试连接到namenode。此信息是否存在于数据节点日志中?是。运行“sbin/start dfs.sh”时返回的消息是“node1:starting datanode,登录到/home/username/hadoop-2.8.1/logs/hadoop-username-datanode-node1.out”。所以我认为datanode日志在文件上方。它错了吗?当我使用-put命令时,同样的错误也发生了。我以为datanodes是活动的,但namenode无法访问datanode,这是错误的吗?我忘记打开端口,所以我打开了“sudo netstat-ntlp result”中显示的端口。然后这个过程稍微继续。但出现了新的错误。错误消息为“没有到主机的路由”,因此返回消息“重试连接到服务器:131.113.101.102:“port_num”。我打开了“端口号”,但随后返回了消息“重试连接到服务器:131.113.101.102:“另一个端口号”。我认为“port_num”不是常量,所以我应该打开哪个端口来访问从属节点?
Configured Capacity: 240611487744 (224.09 GB)
Present Capacity: 79048312831 (73.62 GB)
DFS Remaining: 79040917504 (73.61 GB)
DFS Used: 7395327 (7.05 MB)
DFS Used%: 0.01%
Under replicated blocks: 36
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (1):
Name: 127.0.0.1:50010 (127.0.0.1)
Hostname: HSNMM-Shailendra.com
Decommission Status : Normal
Configured Capacity: 240611487744 (224.09 GB)
DFS Used: 7395327 (7.05 MB)
Non DFS Used: 161563174913 (150.47 GB)
DFS Remaining: 79040917504 (73.61 GB)
DFS Used%: 0.00%
DFS Remaining%: 32.85%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Tue Oct 24 23:39:47 IST 2017