Java 从属VM已从从属列表中删除,并且仍由Thread/Tez访问

Java 从属VM已从从属列表中删除,并且仍由Thread/Tez访问,java,hadoop,hdfs,yarn,master-slave,Java,Hadoop,Hdfs,Yarn,Master Slave,因此,我从从属VM列表中删除了vm4,当我运行以下命令时,它不会访问它 hdfs dfsadmin -report 结果是: ubuntu@anmol-vm1-new:~$ hdfs dfsadmin -report 15/12/14 06:56:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where app

因此,我从从属VM列表中删除了vm4,当我运行以下命令时,它不会访问它

hdfs dfsadmin -report
结果是:

ubuntu@anmol-vm1-new:~$ hdfs dfsadmin -report
15/12/14 06:56:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 1268169326592 (1.15 TB)
Present Capacity: 1199270457337 (1.09 TB)
DFS Remaining: 1199213064192 (1.09 TB)
DFS Used: 57393145 (54.73 MB)
DFS Used%: 0.00%
Under replicated blocks: 27
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 3 (3 total, 0 dead)

Live datanodes:
Name: 10.0.1.191:50010 (anmol-vm2-new)
Hostname: anmol-vm2-new
Decommission Status : Normal
Configured Capacity: 422723108864 (393.69 GB)
DFS Used: 19005440 (18.13 MB)
Non DFS Used: 21501829120 (20.03 GB)
DFS Remaining: 401202274304 (373.65 GB)
DFS Used%: 0.00%
DFS Remaining%: 94.91%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Last contact: Mon Dec 14 06:56:12 UTC 2015


Name: 10.0.1.190:50010 (anmol-vm1-new)
Hostname: anmol-vm1-new
Decommission Status : Normal
Configured Capacity: 422723108864 (393.69 GB)
DFS Used: 19369984 (18.47 MB)
Non DFS Used: 25831350272 (24.06 GB)
DFS Remaining: 396872388608 (369.62 GB)
DFS Used%: 0.00%
DFS Remaining%: 93.88%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Last contact: Mon Dec 14 06:56:13 UTC 2015


Name: 10.0.1.192:50010 (anmol-vm3-new)
Hostname: anmol-vm3-new
Decommission Status : Normal
Configured Capacity: 422723108864 (393.69 GB)
DFS Used: 19017721 (18.14 MB)
Non DFS Used: 21565689863 (20.08 GB)
DFS Remaining: 401138401280 (373.59 GB)
DFS Used%: 0.00%
DFS Remaining%: 94.89%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Last contact: Mon Dec 14 06:56:11 UTC 2015
然而,在某个时刻,纱线试图访问它。这是我收到的日志:

yarn logs -applicationId application_1450050523156_0009

你知道为什么它试图访问VM4,而VM4不在从属列表中,以及如何修复它吗

更新: 我执行了以下操作,但仍然收到一个错误,因为它试图访问
vm4

1) 在yarnpp的
conf
目录中添加文件
exclude
mapred.exclude
,包括vm4的私有IP地址

2) 将此添加到
mapred site.xml

<property>
    <name>mapred.hosts.exclude</name>
    <value>/home/hadoop/yarnpp/conf/mapred.exclude</value>
    <description>Names a file that contains the list of hosts that
      should be excluded by the jobtracker.  If the value is empty, no
      hosts are excluded.</description>
  </property>
<property>
 <name>dfs.hosts.exclude</name>
 <value>/home/hadoop/yarnpp/conf/exclude</value>
 <final>true</final>
</property>
<property>
    <name>yarn.resourcemanager.nodes.exclude-path</name>
    <value>/home/hadoop/yarnpp/conf/exclude</value>
    <description>Path to file with nodes to exclude.</description>
  </property>
3.5)将此添加到
纱线站点.xml

<property>
    <name>mapred.hosts.exclude</name>
    <value>/home/hadoop/yarnpp/conf/mapred.exclude</value>
    <description>Names a file that contains the list of hosts that
      should be excluded by the jobtracker.  If the value is empty, no
      hosts are excluded.</description>
  </property>
<property>
 <name>dfs.hosts.exclude</name>
 <value>/home/hadoop/yarnpp/conf/exclude</value>
 <final>true</final>
</property>
<property>
    <name>yarn.resourcemanager.nodes.exclude-path</name>
    <value>/home/hadoop/yarnpp/conf/exclude</value>
    <description>Path to file with nodes to exclude.</description>
  </property>
这是新的日志:

此外,即使vm4不在VM列表中,它仍显示在此处:

现在,当我运行
gridmix generate.sh
作业时,通过所有这些更新,我得到以下错误:

15/12/14 10:14:53 INFO ipc.Client: Retrying connect to server: anmol-vm3-new/10.0.1.192:50833. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

在和Mona聊天后,问题现在已经解决了

当一次运行stop-all.sh命令时,可能不会停止所有进程。最好运行
ps-ef
命令,以确保所有节点上的所有进程都已停止。Monal已运行命令stop-all.sh并运行命令ps-ef | grep-i datanode,该命令仍在显示结果


然后在聊天中,我要求她重新启动所有虚拟机,这将清理悬空的进程。硬重启解决了问题。

您是如何将其从从属节点列表中删除的?在从从属文件中删除ip后是否重新启动了资源管理器?@ManjunathBallur我在conf目录中有一个从属文件,并且我从/etc中删除了它/hosts@MonaJalal,你能在所有节点上运行stop-all.sh并运行
ps-ef | grep-i manager
以确保所有服务都已停止吗?我发现了一件奇怪的事情。活动节点显示为4,停用节点显示为1。应该是3加1对吗?所以,我怀疑,虽然您解除了它,但节点管理器仍在vm4上运行。网络提供商执行了硬重启。简单的停止和启动并不能解决问题