Hadoop 如何在Oozie中将参数传递给mapreduce作业

Hadoop 如何在Oozie中将参数传递给mapreduce作业,hadoop,mapreduce,oozie,oozie-coordinator,Hadoop,Mapreduce,Oozie,Oozie Coordinator,我将mapreduce作业打包为jar文件(mymapreduce.jar)。在运行时,它需要几个参数,例如hadoop jar mymapreduce.jar StartClass-i input-p参数1-u参数2。如何将其作为操作写入Oozie工作流文件?将要在Oozie工作流中使用的参数写入job.properties文件,如下所示 nameNode=hdfs://localhost:9000 hdfs://abc.xyz.yahoo.com:8020 jobTracker=

我将mapreduce作业打包为jar文件(mymapreduce.jar)。在运行时,它需要几个参数,例如hadoop jar mymapreduce.jar StartClass-i input-p参数1-u参数2。如何将其作为操作写入Oozie工作流文件?

将要在Oozie工作流中使用的参数写入job.properties文件,如下所示

 nameNode=hdfs://localhost:9000 
hdfs://abc.xyz.yahoo.com:8020
    jobTracker=localhost:9001        
    queueName=default
    examplesRoot=map-reduce
    oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}
    inputDir=/user/input-data
    outputDir=/user/map-reduce
您可以添加配置和workflow.xml中job.properties中定义的变量,如下所示

<workflow-app name='wordcount-wf' xmlns="uri:oozie:workflow:0.2">
    <start to='wordcount'/>
    <action name='wordcount'>
        <map-reduce>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <prepare>
            </prepare>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
                <property>
                    <name>mapred.mapper.class</name>
                    <value>org.myorg.WordCount.Map</value>
                </property>
                <property>
                    <name>mapred.reducer.class</name>
                    <value>org.myorg.WordCount.Reduce</value>
                </property>
                <property>
                    <name>mapred.input.dir</name>
                    <value>${inputDir}</value>
                </property>
                <property>
                    <name>mapred.output.dir</name>
                    <value>${outputDir}</value>
                </property>
            </configuration>
        </map-reduce>
        <ok to='end'/>
        <error to='end'/>
    </action>
    <kill name='kill'>
        <value>${wf:errorCode("wordcount")}</value>
    </kill/>
    <end name='end'/>
</workflow-app>

${jobTracker}
${nameNode}
mapred.job.queue.name
${queueName}
mapred.mapper.class
org.myorg.WordCount.Map
mapred.reducer.class
org.myorg.WordCount.Reduce
mapred.input.dir
${inputDir}
mapred.output.dir
${outputDir}
${wf:errorCode(“字数”)}

希望这有帮助。

您可以使用java操作调用mapreduce作业。mapreduce驱动程序类应指定为main-class。您还可以将所需参数作为参数传递。参数解析逻辑应在驱动程序类中定义

<workflow-app name="mapreduce-wf" xmlns="uri:oozie:workflow:0.4">   
    <start to="mapreduce_node"/>
    <action name="mapreduce_node">
       <java>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <main-class>com.test.MyMapreduceDriver</main-class>
            <arg>-i</arg>
            <arg>-p</arg>
            <arg>parameter1</arg>
            <arg>-u</arg>
            <arg>parameter2</arg>
        </java>
        <ok to="end"/>
        <error to="kill"/>
    </action>
    <kill name="kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>

${jobTracker}
${nameNode}
com.test.myMapReducedDriver
-我
-p
参数1
-u
参数2
操作失败,错误消息[${wf:errorMessage(wf:lastErrorNode())}]
另一个选项是将其作为mapreduce操作执行。由于不会指定驱动程序类,所以除了其他mapreduce属性外,还可以将所需参数作为配置属性传递。您可以使用配置对象在mapper和reducer类中访问这些参数

<workflow-app name='mapreduce-wf' xmlns="uri:oozie:workflow:0.2">
<start to='mapreduce'/>
<action name='mapreduce'>
    <map-reduce>
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <prepare>
        </prepare>
        <configuration>
            <property>
                <name>p</name>
                <value>parameter1</value>
            </property>
             <property>
                <name>u</name>
                <value>parameter2</value>
            </property>
            <property>
                <name>mapred.job.queue.name</name>
                <value>${queueName}</value>
            </property>
            <property>
                <name>mapred.mapper.class</name>
                <value>org.myorg.WordCount.Map</value>
            </property>
            <property>
                <name>mapred.reducer.class</name>
                <value>org.myorg.WordCount.Reduce</value>
            </property>
            <property>
                <name>mapred.input.dir</name>
                <value>${inputDir}</value>
            </property>
            <property>
                <name>mapred.output.dir</name>
                <value>${outputDir}</value>
            </property>
        </configuration>
    </map-reduce>
    <ok to='end'/>
    <error to='end'/>
</action>
<kill name='kill'>
    <value>${wf:errorCode("mapreduce")}</value>
</kill/>
<end name='end'/>
</workflow-app>

${jobTracker}
${nameNode}
P
参数1
U
参数2
mapred.job.queue.name
${queueName}
mapred.mapper.class
org.myorg.WordCount.Map
mapred.reducer.class
org.myorg.WordCount.Reduce
mapred.input.dir
${inputDir}
mapred.output.dir
${outputDir}
${wf:errorCode(“mapreduce”)}

谢谢,我相信将mapreduce作业指定为java程序(选项1)会给我们带来更大的灵活性,因为我们在驱动程序中添加了更多的逻辑,但是如果我们只在第二个选项中指定mapper和reducer,我们就做不到,对吗?是的。使用java操作,您可以在驱动程序类中包含其他逻辑。