Java 从属VM已从从属列表中删除，并且仍由Thread/Tez访问_Java_Hadoop_Hdfs_Yarn_Master Slave

Java 从属VM已从从属列表中删除，并且仍由Thread/Tez访问

java hadoop

Java 从属VM已从从属列表中删除，并且仍由Thread/Tez访问,java,hadoop,hdfs,yarn,master-slave,Java,Hadoop,Hdfs,Yarn,Master Slave,因此，我从从属VM列表中删除了vm4，当我运行以下命令时，它不会访问它 hdfs dfsadmin -report 结果是： ubuntu@anmol-vm1-new:~$ hdfs dfsadmin -report 15/12/14 06:56:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where app

因此，我从从属VM列表中删除了vm4，当我运行以下命令时，它不会访问它

hdfs dfsadmin -report

结果是：

ubuntu@anmol-vm1-new:~$ hdfs dfsadmin -report
15/12/14 06:56:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 1268169326592 (1.15 TB)
Present Capacity: 1199270457337 (1.09 TB)
DFS Remaining: 1199213064192 (1.09 TB)
DFS Used: 57393145 (54.73 MB)
DFS Used%: 0.00%
Under replicated blocks: 27
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 3 (3 total, 0 dead)

Live datanodes:
Name: 10.0.1.191:50010 (anmol-vm2-new)
Hostname: anmol-vm2-new
Decommission Status : Normal
Configured Capacity: 422723108864 (393.69 GB)
DFS Used: 19005440 (18.13 MB)
Non DFS Used: 21501829120 (20.03 GB)
DFS Remaining: 401202274304 (373.65 GB)
DFS Used%: 0.00%
DFS Remaining%: 94.91%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Last contact: Mon Dec 14 06:56:12 UTC 2015


Name: 10.0.1.190:50010 (anmol-vm1-new)
Hostname: anmol-vm1-new
Decommission Status : Normal
Configured Capacity: 422723108864 (393.69 GB)
DFS Used: 19369984 (18.47 MB)
Non DFS Used: 25831350272 (24.06 GB)
DFS Remaining: 396872388608 (369.62 GB)
DFS Used%: 0.00%
DFS Remaining%: 93.88%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Last contact: Mon Dec 14 06:56:13 UTC 2015


Name: 10.0.1.192:50010 (anmol-vm3-new)
Hostname: anmol-vm3-new
Decommission Status : Normal
Configured Capacity: 422723108864 (393.69 GB)
DFS Used: 19017721 (18.14 MB)
Non DFS Used: 21565689863 (20.08 GB)
DFS Remaining: 401138401280 (373.59 GB)
DFS Used%: 0.00%
DFS Remaining%: 94.89%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Last contact: Mon Dec 14 06:56:11 UTC 2015

然而，在某个时刻，纱线试图访问它。这是我收到的日志：

yarn logs -applicationId application_1450050523156_0009

你知道为什么它试图访问VM4，而VM4不在从属列表中，以及如何修复它吗

更新：我执行了以下操作，但仍然收到一个错误，因为它试图访问

vm4

：

1）在yarnpp的

conf

目录中添加文件

exclude

和

mapred.exclude

，包括vm4的私有IP地址

2）将此添加到

mapred site.xml

：

<property>
    <name>mapred.hosts.exclude</name>
    <value>/home/hadoop/yarnpp/conf/mapred.exclude</value>
    <description>Names a file that contains the list of hosts that
      should be excluded by the jobtracker.  If the value is empty, no
      hosts are excluded.</description>
  </property>

<property>
 <name>dfs.hosts.exclude</name>
 <value>/home/hadoop/yarnpp/conf/exclude</value>
 <final>true</final>
</property>

<property>
    <name>yarn.resourcemanager.nodes.exclude-path</name>
    <value>/home/hadoop/yarnpp/conf/exclude</value>
    <description>Path to file with nodes to exclude.</description>
  </property>

3.5）将此添加到

纱线站点.xml

：

<property>
    <name>mapred.hosts.exclude</name>
    <value>/home/hadoop/yarnpp/conf/mapred.exclude</value>
    <description>Names a file that contains the list of hosts that
      should be excluded by the jobtracker.  If the value is empty, no
      hosts are excluded.</description>
  </property>

<property>
 <name>dfs.hosts.exclude</name>
 <value>/home/hadoop/yarnpp/conf/exclude</value>
 <final>true</final>
</property>

<property>
    <name>yarn.resourcemanager.nodes.exclude-path</name>
    <value>/home/hadoop/yarnpp/conf/exclude</value>
    <description>Path to file with nodes to exclude.</description>
  </property>

这是新的日志：

此外，即使vm4不在VM列表中，它仍显示在此处：

现在，当我运行

gridmix generate.sh

作业时，通过所有这些更新，我得到以下错误：

15/12/14 10:14:53 INFO ipc.Client: Retrying connect to server: anmol-vm3-new/10.0.1.192:50833. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

在和Mona聊天后，问题现在已经解决了

当一次运行stop-all.sh命令时，可能不会停止所有进程。最好运行

ps-ef

命令，以确保所有节点上的所有进程都已停止。Monal已运行命令stop-all.sh并运行命令ps-ef | grep-i datanode，该命令仍在显示结果

然后在聊天中，我要求她重新启动所有虚拟机，这将清理悬空的进程。硬重启解决了问题。

您是如何将其从从属节点列表中删除的？在从从属文件中删除ip后是否重新启动了资源管理器？@ManjunathBallur我在conf目录中有一个从属文件，并且我从/etc中删除了它/hosts@MonaJalal,你能在所有节点上运行stop-all.sh并运行

ps-ef | grep-i manager

以确保所有服务都已停止吗？我发现了一件奇怪的事情。活动节点显示为4，停用节点显示为1。应该是3加1对吗？所以，我怀疑，虽然您解除了它，但节点管理器仍在vm4上运行。网络提供商执行了硬重启。简单的停止和启动并不能解决问题