Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/hadoop/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何用java中的纱线api提交mapreduce作业_Java_Hadoop_Yarn - Fatal编程技术网

如何用java中的纱线api提交mapreduce作业

如何用java中的纱线api提交mapreduce作业,java,hadoop,yarn,Java,Hadoop,Yarn,我想使用java API提交我的MR作业,我尝试这样做,但我不知道要添加什么amContainer,下面是我编写的代码: package org.apache.hadoop.examples; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.yarn.api.protocolrecords.GetNewApplicationResponse; import org.apache.hadoop.yarn.

我想使用java API提交我的MR作业,我尝试这样做,但我不知道要添加什么amContainer,下面是我编写的代码:

package org.apache.hadoop.examples;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.yarn.api.protocolrecords.GetNewApplicationResponse;
import org.apache.hadoop.yarn.api.records.ApplicationId;
import org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext;
import org.apache.hadoop.yarn.api.records.ContainerLaunchContext;
import org.apache.hadoop.yarn.api.records.Resource;
import org.apache.hadoop.yarn.client.api.YarnClient;
import org.apache.hadoop.yarn.client.api.YarnClientApplication;
import org.apache.hadoop.yarn.util.Records;
import org.mortbay.util.ajax.JSON;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class YarnJob {
    private static Logger logger = LoggerFactory.getLogger(YarnJob.class);

    public static void main(String[] args) throws Throwable {

        Configuration conf = new Configuration();
        YarnClient client = YarnClient.createYarnClient();
        client.init(conf);
        client.start();

        System.out.println(JSON.toString(client.getAllQueues()));
        System.out.println(JSON.toString(client.getConfig()));
        //System.out.println(JSON.toString(client.getApplications()));
        System.out.println(JSON.toString(client.getYarnClusterMetrics()));

        YarnClientApplication app = client.createApplication();
        GetNewApplicationResponse appResponse = app.getNewApplicationResponse();

        ApplicationId appId = appResponse.getApplicationId();

        // Create launch context for app master
        ApplicationSubmissionContext appContext = Records.newRecord(ApplicationSubmissionContext.class);
        // set the application id
        appContext.setApplicationId(appId);
        // set the application name
        appContext.setApplicationName("test");
        // Set the queue to which this application is to be submitted in the RM
        appContext.setQueue("default");

        // Set up the container launch context for the application master
        ContainerLaunchContext amContainer = Records.newRecord(ContainerLaunchContext.class);
        //amContainer.setLocalResources();
        //amContainer.setCommands();
        //amContainer.setEnvironment();

        appContext.setAMContainerSpec(amContainer);
        appContext.setResource(Resource.newInstance(1024, 1));

        appContext.setApplicationType("MAPREDUCE");

        // Submit the application to the applications manager
        client.submitApplication(appContext);
        //client.stop();
    }
}
我可以使用命令界面正确运行mapreduce作业:

hadoop jar wordcount.jar org.apache.hadoop.examples.WordCount /user/admin/input /user/admin/output/

但我如何在Thread java api中提交此wordcount作业?

您不使用Thread客户端提交作业,而是使用MapReduce api提交作业

但是,如果您需要对作业进行更多控制,如获取完成状态、映射器阶段状态、还原器阶段状态等,则可以使用

job.submit();
而不是

job.waitForCompletion(true)
可以使用函数job.mapProgress()和job.reduceProgress()获取状态。作业对象中有许多功能,您可以进行探索

至于你关于

hadoop jar wordcount.jar org.apache.hadoop.examples.WordCount /user/admin/input /user/admin/output/
这里发生的是您正在运行wordcount.jar中提供的驱动程序。您使用的不是“java-jar-wordcount.jar”,而是“hadoop-jar-wordcount.jar”。您也可以使用“纱线jar wordcount.jar”。与java-jar命令相比,Hadoop/Thread将设置必要的附加类路径。这将执行驱动程序的“main()”,该驱动程序位于命令中指定的org.apache.hadoop.examples.WordCount类中

您可以在这里查看源代码

我认为您希望通过Thread提交作业的唯一原因是将其与某种服务集成,在某些事件中启动MapReduce2作业

为此,你可以让你的驱动程序main()像这样

public class MyMapReduceDriver extends Configured implements Tool {
public static void main(String[] args) throws Exception {

    Configuration conf = new Configuration();

    /******/

    int errCode = ToolRunner.run(conf, new MyMapReduceDriver(), args);

    System.exit(errCode);
}

@Override
public int run(String[] args) throws Exception {

    while(true) {

        try{

            runMapReduceJob();
        }
        catch(IOException e)
        {
            e.printStackTrace();
        }
    }
}

private void runMapReduceJob() {

    Configuration conf = new Configuration();
    Job job = new Job(conf, "word count");
    /******/

    job.submit();

    // Get status
    while(job.getJobState()==RUNNING || job.getJobState()==PREP){
        Thread.sleep(1000);

        System.out.println(" Map: "+ StringUtils.formatPercent(job.mapProgress(), 0) + " Reducer: "+ StringUtils.formatPercent(job.reduceProgress(), 0));

    }
}}
希望这有帮助