Hadoop Oozie作业在准备状态下的开始操作时卡住
我有一个Oozie作业,我从java客户机开始,它在开始操作时卡住了,它说它正在运行,但开始节点处于准备状态 这是为什么?如何解决问题 Oozie工作流只包含一个java操作。集群上的Hadoop版本是2.4.0,而集群上的Oozie版本是4.0.0 这是workflow.xml文件Hadoop Oozie作业在准备状态下的开始操作时卡住,hadoop,oozie,Hadoop,Oozie,我有一个Oozie作业,我从java客户机开始,它在开始操作时卡住了,它说它正在运行,但开始节点处于准备状态 这是为什么?如何解决问题 Oozie工作流只包含一个java操作。集群上的Hadoop版本是2.4.0,而集群上的Oozie版本是4.0.0 这是workflow.xml文件 <workflow-app xmlns='uri:oozie:workflow:0.2' name='java-filecopy-wf'> <start to='java1'/> &
<workflow-app xmlns='uri:oozie:workflow:0.2' name='java-filecopy-wf'>
<start to='java1'/>
<action name='java1'>
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>default</value>
</property>
</configuration>
<main-class>testingoozieclient.Client</main-class>
<capture-output/>
</java>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Java failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
</message>
</kill>
<end name='end' />
因为我试过几次,现在有5,6个工作流都以状态运行,但是当我通过web界面查看时,我可以看到它们都被困在处于准备状态的开始节点上
在一些提交的工作流被终止后,我可以启动另一个工作流。这一次,工作流从开始到java操作,但以类似的方式被困在java操作中——它保持在PREP状态 下面是日志的样子
2015-06-22 17:54:37,366 INFO ActionStartXCommand:539 - USER[hadoop] GROUP[-] TOKEN[] APP[java-filecopy-wf] JOB[0000030-150619153616589-oozie-oozi-W] ACTION[0000030-150619153616589-oozie-oozi-W@:start:] Start action [0000030-150619153616589-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2015-06-22 17:54:37,367 WARN ActionStartXCommand:542 - USER[hadoop] GROUP[-] TOKEN[] APP[java-filecopy-wf] JOB[0000030-150619153616589-oozie-oozi-W] ACTION[0000030-150619153616589-oozie-oozi-W@:start:] [***0000030-150619153616589-oozie-oozi-W@:start:***]Action status=DONE
2015-06-22 17:54:37,367 WARN ActionStartXCommand:542 - USER[hadoop] GROUP[-] TOKEN[] APP[java-filecopy-wf] JOB[0000030-150619153616589-oozie-oozi-W] ACTION[0000030-150619153616589-oozie-oozi-W@:start:] [***0000030-150619153616589-oozie-oozi-W@:start:***]Action updated in DB!
2015-06-22 17:54:37,426 INFO ActionEndXCommand:539 - USER[hadoop] GROUP[-] TOKEN[] APP[java-filecopy-wf] JOB[0000030-150619153616589-oozie-oozi-W] ACTION[0000030-150619153616589-oozie-oozi-W@:start:] end executor for wf action 0000030-150619153616589-oozie-oozi-W with wf job 0000030-150619153616589-oozie-oozi-W
2015-06-22 17:54:37,676 INFO ActionStartXCommand:539 - USER[hadoop] GROUP[-] TOKEN[] APP[java-filecopy-wf] JOB[0000030-150619153616589-oozie-oozi-W] ACTION[0000030-150619153616589-oozie-oozi-W@java1] Start action [0000030-150619153616589-oozie-oozi-W@java1] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2015-06-22 17:54:38,316 INFO JavaActionExecutor:539 - USER[hadoop] GROUP[-] TOKEN[] APP[java-filecopy-wf] JOB[0000030-150619153616589-oozie-oozi-W] ACTION[0000030-150619153616589-oozie-oozi-W@java1] addShareLib: using FileSystem hdfs://master:8020
2015-06-22 17:54:38,501 WARN JavaActionExecutor:542 - USER[hadoop] GROUP[-] TOKEN[] APP[java-filecopy-wf] JOB[0000030-150619153616589-oozie-oozi-W] ACTION[0000030-150619153616589-oozie-oozi-W@java1] credentials is null for the action
2015-06-22 17:54:38,640 INFO JavaActionExecutor:539 - USER[hadoop] GROUP[-] TOKEN[] APP[java-filecopy-wf] JOB[0000030-150619153616589-oozie-oozi-W] ACTION[0000030-150619153616589-oozie-oozi-W@java1] addShareLib: using FileSystem hdfs://master:8020
今天早上我发现作业处于挂起状态,开始节点正常,但Java节点处于开始重试状态,出现以下错误-JA006:从master02.novalocal/192.168.111.52调用master02.novalocal:8032连接失败异常:Java.net.ConnectException:连接被拒绝;有关更多详细信息,请参阅: 我可能应该强调,Oozie与资源管理器在同一台机器上工作,因此奇怪的是,它试图在同一台机器上启动工作流,但表示连接失败 以下是Oozie的作业日志:
2015-06-22 17:54:37,366 INFO ActionStartXCommand:539 - USER[hadoop] GROUP[-] TOKEN[] APP[java-filecopy-wf] JOB[0000030-150619153616589-oozie-oozi-W] ACTION[0000030-150619153616589-oozie-oozi-W@:start:] Start action [0000030-150619153616589-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2015-06-22 17:54:37,367 WARN ActionStartXCommand:542 - USER[hadoop] GROUP[-] TOKEN[] APP[java-filecopy-wf] JOB[0000030-150619153616589-oozie-oozi-W] ACTION[0000030-150619153616589-oozie-oozi-W@:start:] [***0000030-150619153616589-oozie-oozi-W@:start:***]Action status=DONE
2015-06-22 17:54:37,367 WARN ActionStartXCommand:542 - USER[hadoop] GROUP[-] TOKEN[] APP[java-filecopy-wf] JOB[0000030-150619153616589-oozie-oozi-W] ACTION[0000030-150619153616589-oozie-oozi-W@:start:] [***0000030-150619153616589-oozie-oozi-W@:start:***]Action updated in DB!
2015-06-22 17:54:37,426 INFO ActionEndXCommand:539 - USER[hadoop] GROUP[-] TOKEN[] APP[java-filecopy-wf] JOB[0000030-150619153616589-oozie-oozi-W] ACTION[0000030-150619153616589-oozie-oozi-W@:start:] end executor for wf action 0000030-150619153616589-oozie-oozi-W with wf job 0000030-150619153616589-oozie-oozi-W
2015-06-22 17:54:37,676 INFO ActionStartXCommand:539 - USER[hadoop] GROUP[-] TOKEN[] APP[java-filecopy-wf] JOB[0000030-150619153616589-oozie-oozi-W] ACTION[0000030-150619153616589-oozie-oozi-W@java1] Start action [0000030-150619153616589-oozie-oozi-W@java1] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2015-06-22 17:54:38,316 INFO JavaActionExecutor:539 - USER[hadoop] GROUP[-] TOKEN[] APP[java-filecopy-wf] JOB[0000030-150619153616589-oozie-oozi-W] ACTION[0000030-150619153616589-oozie-oozi-W@java1] addShareLib: using FileSystem hdfs://master01.novalocal:8020
2015-06-22 17:54:38,501 WARN JavaActionExecutor:542 - USER[hadoop] GROUP[-] TOKEN[] APP[java-filecopy-wf] JOB[0000030-150619153616589-oozie-oozi-W] ACTION[0000030-150619153616589-oozie-oozi-W@java1] credentials is null for the action
2015-06-22 17:54:38,640 INFO JavaActionExecutor:539 - USER[hadoop] GROUP[-] TOKEN[] APP[java-filecopy-wf] JOB[0000030-150619153616589-oozie-oozi-W] ACTION[0000030-150619153616589-oozie-oozi-W@java1] addShareLib: using FileSystem hdfs://master01.novalocal:8020
2015-06-22 20:05:33,340 WARN ActionStartXCommand:542 - USER[hadoop] GROUP[-] TOKEN[] APP[java-filecopy-wf] JOB[0000030-150619153616589-oozie-oozi-W] ACTION[0000030-150619153616589-oozie-oozi-W@java1] Error starting action [java1]. ErrorType [TRANSIENT], ErrorCode [ JA006], Message [ JA006: Call From master02.novalocal/192.168.111.52 to master02.novalocal:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused]
org.apache.oozie.action.ActionExecutorException: JA006: Call From master02.novalocal/192.168.111.52 to master02.novalocal:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412)
at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392)
at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:837)
at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:988)
at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:215)
at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:60)
at org.apache.oozie.command.XCommand.call(XCommand.java:280)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255)
at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.net.ConnectException: Call From master02.novalocal/192.168.111.52 to master02.novalocal:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.GeneratedConstructorAccessor98.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
at org.apache.hadoop.ipc.Client.call(Client.java:1414)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy42.getDelegationToken(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getDelegationToken(ApplicationClientProtocolPBClientImpl.java:282)
at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
at com.sun.proxy.$Proxy43.getDelegationToken(Unknown Source)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getRMDelegationToken(YarnClientImpl.java:452)
at org.apache.hadoop.mapred.ResourceMgrDelegate.getDelegationToken(ResourceMgrDelegate.java:166)
at org.apache.hadoop.mapred.YARNRunner.getDelegationToken(YARNRunner.java:220)
at org.apache.hadoop.mapreduce.Cluster.getDelegationToken(Cluster.java:400)
at org.apache.hadoop.mapred.JobClient$16.run(JobClient.java:1203)
at org.apache.hadoop.mapred.JobClient$16.run(JobClient.java:1200)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapred.JobClient.getDelegationToken(JobClient.java:1199)
at org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:377)
at org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1031)
at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:786)
... 10 more
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462)
at org.apache.hadoop.ipc.Client.call(Client.java:1381)
... 33 more
我打赌你的map reduce群集一定没有插槽了。查看配置了多少地图插槽
还要尝试确定端口8032上的服务是否已启动。您可以使用命令sudonetstat-netulp | grep8032。如果没有返回任何输出,则服务将关闭。您还可以使用nmap或telnet检查连接 请检查job.properties中的端口 这通常是namenode和jobtracker端口的问题。
确保job.properties文件中的jobtracker端口正确。oozie作业陷入准备状态(最终进入启动手动状态)的主要原因是Hadoop服务端口配置错误
nameNode=hdfs://localhost:9000
jobTracker=10.71.71.15:8032
如果您正在运行Thread,那么jobtracker的默认端口与资源管理器的端口相同
另外,尝试修复其他端口问题,如jobhistoryserver的端口
(如oozie错误消息中所述)。命令:
netstat -ntpl | grep 8032
查看端口号是否打开。
如果不是,则需要使用端口号启动服务。在我的情况下,问题是由于纱线停止。我启动了它,工作流也取得了进展。你能添加oozie中说明的错误日志吗?我会的,但最奇怪的是我没有收到任何错误。当我在web界面中查看作业日志时,它是完全空的吗?我应该在哪里查找错误?转到作业日志并获取作业日志!或者在historyserver中尝试!!我认为oozie成功地将工作提交给hadoop!!请检查历史服务器中的纱线日志。我已经编辑了我的问题,请检查。谢谢你的回答,但是如果我没有运行任何MR作业,这可能吗?我正在启动一个简单的java类,它会打印一些文本,只是为了检查我的客户端是否工作。Oozie在Map Reduce集群上运行它的作业,所以首先你应该确保你的Map Reduce集群已经启动并运行,有足够的Map插槽(至少两个用于运行一个作业)。尝试排序服务是否在端口8032上运行。您可以使用命令sudonetstat-netulp | grep8032。如果没有返回任何输出,则服务将关闭。您还可以使用nmap或telnet检查连接。感谢您的回答,它似乎使用了非标准(8050)端口。我猜问题在于太多的工作开始了(正如你所建议的),这导致了窒息。现在我得到一个ClassNotFound异常,但工作流似乎正在运行,因此我接受您的答案,基于答案本身和最后一条评论。:)
netstat -ntpl | grep 8032