hadoop应用程序失败mapreduce成功

hadoop应用程序失败mapreduce成功,hadoop,mapreduce,yarn,resourcemanager,Hadoop,Mapreduce,Yarn,Resourcemanager,我对hadoop 2(hadoop 2.2.0)比较陌生,我不明白为什么资源管理器上的M/R job~应用程序被标记为失败: application_1399458460502_0015 pig Max temperature MAPREDUCE default Wed, 04 Jun 2014 17:16:52 GMT Wed, 04 Jun 2014 17:17:30 GMT FAILED FAILED History 当我知道M/R作业成功完成,甚至作业历史记录服务器

我对hadoop 2(hadoop 2.2.0)比较陌生,我不明白为什么资源管理器上的M/R job~应用程序被标记为失败:

application_1399458460502_0015  pig Max temperature MAPREDUCE   default Wed, 04 Jun 2014 17:16:52 GMT   Wed, 04 Jun 2014 17:17:30 GMT   FAILED  FAILED   History
当我知道M/R作业成功完成,甚至作业历史记录服务器也声称成功时:

2014.06.04 13:16:52 EDT 2014.06.04 13:17:19 EDT job_1399458460502_0015  Max temperature pig default SUCCEEDED   2   2   1   1
我不明白为什么申请被标记为失败。我在JobHistory服务器-日志上看到的唯一错误如下:

2014-06-04 13:17:19,628 INFO [Thread-62] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify JHEH isAMLastRetry: true
2014-06-04 13:17:19,628 INFO [Thread-62] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: JobHistoryEventHandler notified that forceJobCompletion is true
2014-06-04 13:17:19,628 INFO [Thread-62] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Calling stop for all the services
2014-06-04 13:17:19,629 INFO [Thread-62] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopping JobHistoryEventHandler. Size of the outstanding queue size is 0
2014-06-04 13:17:19,736 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copying hdfs://namenodeha/user/pig/.staging/job_1399458460502_0015/job_1399458460502_0015_1.jhist to hdfs://namenodeha/mr-history/tmp/pig/job_1399458460502_0015-1401902212831-pig-Max+temperature-1401902239623-2-1-SUCCEEDED-default.jhist_tmp
2014-06-04 13:17:19,812 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copied to done location: hdfs://namenodeha/mr-history/tmp/pig/job_1399458460502_0015-1401902212831-pig-Max+temperature-1401902239623-2-1-SUCCEEDED-default.jhist_tmp
2014-06-04 13:17:19,824 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copying hdfs://namenodeha/user/pig/.staging/job_1399458460502_0015/job_1399458460502_0015_1_conf.xml to hdfs://namenodeha/mr-history/tmp/pig/job_1399458460502_0015_conf.xml_tmp
2014-06-04 13:17:19,835 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:0 AssignedReds:1 CompletedMaps:2 CompletedReds:1 ContAlloc:3 ContRel:0 HostLocal:2 RackLocal:0
2014-06-04 13:17:19,880 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copied to done location: hdfs://namenodeha/mr-history/tmp/pig/job_1399458460502_0015_conf.xml_tmp
2014-06-04 13:17:19,914 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: hdfs://namenodeha/mr-history/tmp/pig/job_1399458460502_0015.summary_tmp to hdfs://namenodeha/mr-history/tmp/pig/job_1399458460502_0015.summary
2014-06-04 13:17:19,925 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: hdfs://namenodeha/mr-history/tmp/pig/job_1399458460502_0015_conf.xml_tmp to hdfs://namenodeha/mr-history/tmp/pig/job_1399458460502_0015_conf.xml
2014-06-04 13:17:19,937 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: hdfs://namenodeha/mr-history/tmp/pig/job_1399458460502_0015-1401902212831-pig-Max+temperature-1401902239623-2-1-SUCCEEDED-default.jhist_tmp to hdfs://namenodeha/mr-history/tmp/pig/job_1399458460502_0015-1401902212831-pig-Max+temperature-1401902239623-2-1-SUCCEEDED-default.jhist
2014-06-04 13:17:19,938 INFO [Thread-62] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopped JobHistoryEventHandler. super.stop()
2014-06-04 13:17:19,940 INFO [Thread-62] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Setting job diagnostics to 
2014-06-04 13:17:20,060 ERROR [Thread-62] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Exception while unregistering 
java.lang.NullPointerException
    at org.apache.hadoop.mapreduce.v2.util.MRWebAppUtil.getApplicationWebURLOnJHSWithoutScheme(MRWebAppUtil.java:133)
    at org.apache.hadoop.mapreduce.v2.util.MRWebAppUtil.getApplicationWebURLOnJHSWithScheme(MRWebAppUtil.java:148)
    at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.doUnregistration(RMCommunicator.java:207)
    at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.unregister(RMCommunicator.java:177)
    at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.serviceStop(RMCommunicator.java:250)
    at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.serviceStop(RMContainerAllocator.java:255)
    at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
    at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
    at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.serviceStop(MRAppMaster.java:817)
    at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
    at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
    at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
    at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:159)
    at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
    at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
    at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:548)
    at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:599)
2014-06-04 13:17:20,061 INFO [Thread-62] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Final Stats: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:0 AssignedReds:1 CompletedMaps:2 CompletedReds:1 ContAlloc:3 ContRel:0 HostLocal:2 RackLocal:0
2014-06-04 13:17:20,062 INFO [Thread-62] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Skipping cleaning up the staging dir. assuming AM will be retried.
2014-06-04 13:17:20,062 INFO [Thread-62] org.apache.hadoop.ipc.Server: Stopping server on 43851
2014-06-04 13:17:20,064 INFO [IPC Server listener on 43851] org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 43851
2014-06-04 13:17:20,065 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2014-06-04 13:17:20,065 INFO [TaskHeartbeatHandler PingChecker] org.apache.hadoop.mapreduce.v2.app.TaskHeartbeatHandler: TaskHeartbeatHandler thread interrupted
2014-06-04 13:17:25,066 INFO [Thread-62] org.apache.hadoop.ipc.Server: Stopping server on 44771
2014-06-04 13:17:25,066 INFO [IPC Server listener on 44771] org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 44771
2014-06-04 13:17:25,067 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2014-06-04 13:17:25,072 INFO [Thread-62] org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:0
2014-06-04 13:17:25,172 INFO [Thread-62] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Exiting MR AppMaster..GoodBye!
2014-06-04 13:17:25,173 INFO [Thread-1] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster received a signal. Signaling RMCommunicator and JobHistoryEventHandler.
2014-06-04 13:17:25,173 INFO [Thread-1] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: RMCommunicator notified that iSignalled is: true
2014-06-04 13:17:25,173 INFO [Thread-1] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify RMCommunicator isAMLastRetry: false
2014-06-04 13:17:25,173 INFO [Thread-1] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: RMCommunicator notified that shouldUnregistered is: false
2014-06-04 13:17:25,173 INFO [Thread-1] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify JHEH isAMLastRetry: false
2014-06-04 13:17:25,174 INFO [Thread-1] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: JobHistoryEventHandler notified that forceJobCompletion is false
或从资源管理器:

Application application_1399458460502_0015 failed 2 times due to AM Container for appattempt_1399458460502_0015_000002 exited with exitCode: 1 due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
.Failing this attempt.. Failing the application
所有这些错误都没有给我一点线索。我的配置如下:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <!--internal property-->
    <property>
        <name>bigdata.conf.identification</name>
        <value>cluster-DEV1</value>
        <final>true</final>
    </property>

    <!--hadoop properties-->
    <!-- Put site-specific property overrides in this file. -->


    <!--hbase-site-->
    <property>
        <name>hbase.zookeeper.quorum</name>
        <value>bd-prg-dev1-nn1,bd-prg-dev1-nn2,bd-prg-dev1-rm1</value>
    </property>
    <property>
        <name>zookeeper.session.timeout</name>
        <value>60000</value>
    </property>
    <property>
        <name>hbase.zookeeper.property.clientPort</name>
        <value>2181</value>
    </property>
    <property>
        <name>zookeeper.znode.parent</name>
        <value>/hbase-unsecure</value>
    </property>

    <!--core-site-->
    <property>
        <name>hadoop.security.authentication</name>
        <value>simple</value>
    </property>
    <property>
        <name>ipc.client.connect.max.retries</name>
        <value>50</value>
    </property>
    <property>
        <name>ipc.client.connection.maxidletime</name>
        <value>30000</value>
    </property>

    <property>
        <name>ipc.client.idlethreshold</name>
        <value>8000</value>
    </property>
    <property>
        <name>io.compression.codecs</name>
        <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec</value>
    </property>
    <property>
        <name>io.serializations</name>
        <value>org.apache.hadoop.io.serializer.WritableSerialization</value>
    </property>
    <property>
        <name>hadoop.security.authorization</name>
        <value>false</value>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://namenodeha</value>
    </property>
    <property>
        <name>ha.zookeeper.quorum</name>
        <value>bd-prg-dev1-nn1:2181,bd-prg-dev1-nn2:2181,bd-prg-dev1-rm1:2181</value>
    </property>

    <!-- hdfs-site-->
    <property>
        <name>dfs.namenode.http-address</name>
        <value>bd-prg-dev1-nn1:50070</value>
    </property>
    <property>
        <name>dfs.datanode.ipc.address</name>
        <value>0.0.0.0:8010</value>
    </property>
    <property>
        <name>dfs.journalnode.http-address</name>
        <value>0.0.0.0:8480</value>
    </property>
    <property>
        <name>dfs.namenode.accesstime.precision</name>
        <value>0</value>
    </property>
    <property>
        <name>dfs.namenode.stale.datanode.interval</name>
        <value>30000</value>
    </property>
    <property>
        <name>dfs.datanode.address</name>
        <value>0.0.0.0:50010</value>
    </property>
    <property>
        <name>dfs.datanode.http.address</name>
        <value>0.0.0.0:50075</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>bd-prg-dev1-nn2:50090</value>
    </property>
    <property>
        <name>dfs.nameservices</name>
        <value>namenodeha</value>
        <description>Logical name for this new nameservice</description>
    </property>
    <property>
        <name>dfs.ha.namenodes.namenodeha</name>
        <value>nn1,nn2</value>
        <description>Unique identifiers for each NameNode in the nameservice</description>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.namenodeha.nn1</name>
        <value>bd-prg-dev1-nn1:8020</value>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.namenodeha.nn2</name>
        <value>bd-prg-dev1-nn2:8020</value>
    </property>
    <property>
        <name>dfs.namenode.http-address.namenodeha.nn1</name>
        <value>bd-prg-dev1-nn1:50070</value>
    </property>
    <property>
        <name>dfs.namenode.http-address.namenodeha.nn2</name>
        <value>bd-prg-dev1-nn2:50070</value>
    </property>
    <property>
        <name>dfs.client.failover.proxy.provider.namenodeha</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <!--yarn-site-->
    <property>
        <name>yarn.nodemanager.address</name>
        <value>0.0.0.0:45454</value>
    </property>
    <property>
        <name>yarn.nodemanager.container-monitor.interval-ms</name>
        <value>3000</value>
    </property>
    <property>
        <name>yarn.nodemanager.remote-app-log-dir</name>
        <value>/app-logs</value>
    </property>
    <property>
        <name>yarn.log.server.url</name>
        <value>bd-prg-dev1-rm1:19888/jobhistory/logs</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>bd-prg-dev1-rm1:8141</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>bd-prg-dev1-rm1:8025</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.log-aggregation.compression-type</name>
        <value>gz</value>
    </property>
    <property>
        <name>yarn.nodemanager.health-checker.script.path</name>
        <value>/etc/hadoop/conf/health_check</value>
    </property>
    <property>
        <name>yarn.nodemanager.container-executor.class</name>
        <value>org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor</value>
    </property>
    <property>
        <name>yarn.nodemanager.remote-app-log-dir-suffix</name>
        <value>logs</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>bd-prg-dev1-rm1:8088</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>bd-prg-dev1-rm1:8050</value>
    </property>
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>bd-prg-dev1-rm1:8030</value>
    </property>

    <!--mapred-site-->
    <property>
        <name>mapreduce.jobhistory.intermediate-done-dir</name>
        <value>/mr-history/tmp</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.done-dir</name>
        <value>/mr-history/done</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>bd-prg-dev1-rm1:19888</value>
    </property>
    <property>
        <name>mapreduce.jobtracker.system.dir</name>
        <value>/mapred/system</value>
    </property>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

有人猜到问题出在哪里了吗?

我不确定它是哪个配置属性,但当我从集群中获取配置并创建一个配置对象时,它工作得很好。

这种问题似乎发生在无法找到配置中的某些内容时。(正如原始答案所暗示的那样)

在我的例子中,我发现当我执行清管器操作时,我需要参考
site.xml
,我的工作流程的相关部分现在如下所示:

<action name="read-into-table">
        <pig>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <job-xml>site.xml</job-xml>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <script>myFile.pig</script>
        </pig>
        <ok to="end" />
        <error to="fail" />
</action>

${jobTracker}
${nameNode}
site.xml
mapred.job.queue.name
${queueName}
myFile.pig
缺少以下行:

<job-xml>site.xml</job-xml>
site.xml

令人惊讶的是,我还遗漏了hive作业的作业xml,但运行起来没有问题。
<job-xml>site.xml</job-xml>