Hadoop Oozie工作流配置单元操作卡在运行中

Hadoop Oozie工作流配置单元操作卡在运行中,hadoop,hive,oozie,yarn,Hadoop,Hive,Oozie,Yarn,我正在运行Hortonworks发行版的Hadoop 2.4.0、Oozie 4.0.0和Hive 0.13.0 我有多个Oozie coordinator作业,可以同时启动工作流。协调器作业各自监视不同的目录,当这些目录中出现_成功文件时,将启动工作流 工作流运行一个配置单元操作,从外部目录读取并复制内容 SET hive.exec.dynamic.partition=true; SET hive.exec.dynamic.partition.mode=nonstrict; DROP TAB

我正在运行Hortonworks发行版的Hadoop 2.4.0、Oozie 4.0.0和Hive 0.13.0

我有多个Oozie coordinator作业,可以同时启动工作流。协调器作业各自监视不同的目录,当这些目录中出现_成功文件时,将启动工作流

工作流运行一个配置单元操作,从外部目录读取并复制内容

SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;

DROP TABLE IF EXISTS ${INPUT_TABLE};

CREATE external TABLE IF NOT EXISTS ${INPUT_TABLE} (
       id bigint,
       data string,
       creationdate timestamp,
       datelastupdated timestamp)
LOCATION '${INPUT_LOCATION}';

-- Read from external table and insert into a partitioned Hive table
FROM ${INPUT_TABLE} ent
INSERT OVERWRITE TABLE mytable PARTITION(data)
SELECT ent.id, ent.data, ent.creationdate, ent.datelastupdated;
当我只运行一个协调器来启动一个工作流时,工作流和配置单元操作将顺利完成,没有任何问题

当同时启动多个工作流时,配置单元操作将长时间处于运行状态

如果我查看作业系统日志,我会看到:

2015-02-18 17:18:26,048 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1423085109915_0223_m_000000 Task Transitioned from SCHEDULED to RUNNING
2015-02-18 17:18:26,586 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1423085109915_0223: ask=3 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:32768, vCores:-3> knownNMs=1
2015-02-18 17:18:27,677 INFO [Socket Reader #1 for port 38704] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1423085109915_0223 (auth:SIMPLE)
2015-02-18 17:18:27,696 INFO [IPC Server handler 0 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : jvm_1423085109915_0223_m_000002 asked for a task
2015-02-18 17:18:27,697 INFO [IPC Server handler 0 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: jvm_1423085109915_0223_m_000002 given task: attempt_1423085109915_0223_m_000000_0
2015-02-18 17:18:34,951 INFO [IPC Server handler 2 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1423085109915_0223_m_000000_0 is : 1.0
2015-02-18 17:19:05,060 INFO [IPC Server handler 11 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1423085109915_0223_m_000000_0 is : 1.0
2015-02-18 17:19:35,161 INFO [IPC Server handler 28 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1423085109915_0223_m_000000_0 is : 1.0
2015-02-18 17:20:05,262 INFO [IPC Server handler 2 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1423085109915_0223_m_000000_0 is : 1.0
2015-02-18 17:20:35,358 INFO [IPC Server handler 11 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1423085109915_0223_m_000000_0 is : 1.0
2015-02-18 17:21:02,452 INFO [IPC Server handler 23 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1423085109915_0223_m_000000_0 is : 1.0
2015-02-18 17:21:32,545 INFO [IPC Server handler 1 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1423085109915_0223_m_000000_0 is : 1.0
2015-02-18 17:22:02,668 INFO [IPC Server handler 12 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1423085109915_0223_m_000000_0 is : 1.0 
2015-02-18 17:18:26048信息[AsyncDispatcher事件处理程序]org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:task_1423085109915_0223_m_000000任务从计划任务转换为运行任务
2015-02-18 17:18:26586信息[RMCommunicator Allocator]org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor:getResources(),用于应用程序
2015-02-18 17:18:27677信息[端口38704的套接字读取器#1]SecurityLogger.org.apache.hadoop.ipc.Server:Auth作业成功_1423085109915_0223(Auth:SIMPLE)
2015-02-18 17:18:27696信息[38704上的IPC服务器处理程序0]org.apache.hadoop.mapred.TaskAttemptListenerImpl:ID为JVM的JVM_1423085109915_0223_m_000002请求任务
2015-02-18 17:18:27697信息[38704上的IPC服务器处理程序0]org.apache.hadoop.mapred.TaskAttemptListenerImpl:JVM ID:JVM_1423085109915_0223_m_000002给定任务:trust_1423085109915_0223_m_0000000
2015-02-18 17:18:34951信息[38704上的IPC服务器处理程序2]org.apache.hadoop.mapred.TaskAttemptListenerImpl:任务尝试的进度为:1.0
2015-02-18 17:19:05060信息[38704上的IPC服务器处理程序11]org.apache.hadoop.mapred.TaskAttemptListenerImpl:任务尝试的进度为:1.0
2015-02-18 17:19:35161信息[IPC Server handler 28 on 38704]org.apache.hadoop.mapred.TaskAttemptListenerImpl:任务尝试的进度为:1.0
2015-02-18 17:20:05262信息[38704上的IPC服务器处理程序2]org.apache.hadoop.mapred.TaskAttemptListenerImpl:任务尝试的进度为:1.0
2015-02-18 17:20:35358信息[38704上的IPC服务器处理程序11]org.apache.hadoop.mapred.TaskAttemptListenerImpl:任务尝试的进度为:1.0
2015-02-18 17:21:02452信息[IPC Server handler 23 on 38704]org.apache.hadoop.mapred.TaskAttemptListenerImpl:任务尝试的进度为:1.0
2015-02-18 17:21:32545信息[38704上的IPC服务器处理程序1]org.apache.hadoop.mapred.TaskAttemptListenerImpl:任务尝试的进度为:1.0
2015-02-18 17:22:02668信息[38704上的IPC服务器处理程序12]org.apache.hadoop.mapred.TaskAttemptListenerImpl:任务尝试的进度为:1.0
它只是一次又一次地打印“任务尝试的进度”

我们的warn-site.xml配置为使用以下内容:

    <property>
      <name>yarn.resourcemanager.scheduler.class</name>
      <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
    </property>

warn.resourcemanager.scheduler.class
org.apache.hadoop.warn.server.resourcemanager.scheduler.capacity.CapacityScheduler
我应该使用不同的调度程序吗


此时,我不确定问题是在Oozie还是Hive中。

结果表明,这与此处列出的心跳问题是同一个问题:

在将调度程序更改为FairScheduler(如上所述)之后,我能够运行多个工作流