Parallel processing 如何在oozie中单独处理并行作业的错误清理

Parallel processing 如何在oozie中单独处理并行作业的错误清理,parallel-processing,oozie,Parallel Processing,Oozie,我必须在oozie中运行一组并行作业,我可以使用oozie中的fork选项运行这些作业。 现在我面临的问题是,如果一个作业失败,那么其余的作业也会失败,因为我在错误时为每个作业调用kill控制节点。 我在网上搜索了很多次,但我找不到如何处理错误,为每一项工作分别清理 任何帮助都将不胜感激 My workflow.xml如下所示: <workflow-app name="WorkFlowForSshAction" xmlns="uri:oozie:workflow:0.1"> <

我必须在oozie中运行一组并行作业,我可以使用oozie中的fork选项运行这些作业。 现在我面临的问题是,如果一个作业失败,那么其余的作业也会失败,因为我在错误时为每个作业调用kill控制节点。 我在网上搜索了很多次,但我找不到如何处理错误,为每一项工作分别清理

任何帮助都将不胜感激

My workflow.xml如下所示:

<workflow-app name="WorkFlowForSshAction" xmlns="uri:oozie:workflow:0.1">
<start to="copyfroms3tohdfs"/>
<action name="copyfroms3tohdfs">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${s3tohdfsscript}</command>
<capture-output/>
</ssh>
<ok to="createhivetables"/>
<error to="killAction"/>
</action>


<action name="createhivetables">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${createhivetablesscript}</command>
<capture-output/>
</ssh>
<ok to="gold__pos_denorm_trn_itm_offr"/>
<error to="killAction"/>
</action>
<action name="gold__pos_denorm_trn_itm_offr">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${denormalizationscript}</command>
<capture-output/>
</ssh>
<ok to="forknode"/>
<error to="killAction"/>
</action>
<fork name="forknode">
        <path start="gold__dypt_pos_trn_offr"/>
        <path start="gold__hr_pos_trn_offr"/>
                <path start="approach3"/>
                <path start="aproach11"/>
                <path start="aproach12"/>
                <path start="aproach13"/>
                <path start="aproach14"/>
                <path start="aproach15"/>
                <path start="aproach16"/>
                <path start="aproach17"/>

</fork>
<action name="gold__dypt_pos_trn_offr">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${daypartscript}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<action name="gold__hr_pos_trn_offr">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${hourscript}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<action name="approach3">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${approach3script}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<action name="aproach11">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${approach11script}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<action name="aproach12">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${approach12script}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<action name="aproach13">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${approach13script}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<action name="aproach14">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${approach14script}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<action name="aproach15">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${approach15script}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<action name="aproach16">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${approach16script}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<action name="aproach17">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${approach17script}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<join name="joinnode" to="end"/>
<kill name="killAction">
<message>"Killed job due to error"</message>
</kill>
<end name="end"/>
</workflow-app>

${CMNodeLogin}
${s3tohdfsscript}
${CMNodeLogin}
${createhivetablesscript}
${CMNodeLogin}
${denormalizationscript}
${CMNodeLogin}
${daypartscript}
${CMNodeLogin}
${hourscript}
${CMNodeLogin}
${approach 3script}
${CMNodeLogin}
${approach 11script}
${CMNodeLogin}
${approach 12script}
${CMNodeLogin}
${approach 13script}
${CMNodeLogin}
${approach 14script}
${CMNodeLogin}
${approach 15script}
${CMNodeLogin}
${approach 16script}
${CMNodeLogin}
${approach 17script}
“由于错误而终止作业”
创建一个新节点(主要是java),它将为您执行清理活动。还将所有“错误到”操作路由到此新节点。您将能够使用EL函数-${wf:lastErrorNode()}识别实际导致错误的节点。将其作为一个参数传递给cleanup handling节点,以便在java中可以执行任何希望用于清理的逻辑(使用JavaHDFSAPI)

新节点类似于:

<action name="myCleanUpAction">
<java>
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <main-class>com.foo.CleanUpMain</main-class>
        <arg>${wf:lastErrorNode()}</arg>
        <arg>any useful argument1</arg>
        <arg>any useful argument2</arg>
    </java>
    <ok to="fail"/>
    <error to="fail"/>
</action>

${jobTracker}
${nameNode}
com.foo.CleanUpMain
${wf:lastErrorNode()}
有什么有用的论据吗
有什么有用的论据吗