Hadoop 如何拥有多个映射器和还原器？_Hadoop_Mapreduce

Hadoop 如何拥有多个映射器和还原器？

hadoop mapreduce

Hadoop 如何拥有多个映射器和还原器？,hadoop,mapreduce,Hadoop,Mapreduce,我有这段代码，其中我设置了一个映射器和一个减缩器。我想再包含一个映射器和一个减缩器，以便做进一步的工作。问题是，我必须将第一个map reduce作业的输出文件作为下一个map reduce作业的输入。是否可以这样做？如果可以，我该如何做 public int run(String[] args) throws Exception { JobConf conf = new JobConf(getConf(),DecisionTreec45.clas

我有这段代码，其中我设置了一个映射器和一个减缩器。我想再包含一个映射器和一个减缩器，以便做进一步的工作。问题是，我必须将第一个map reduce作业的输出文件作为下一个map reduce作业的输入。是否可以这样做？如果可以，我该如何做

public int run(String[] args) throws Exception 
          {
            JobConf conf = new JobConf(getConf(),DecisionTreec45.class);
            conf.setJobName("c4.5");

            // the keys are words (strings)
            conf.setOutputKeyClass(Text.class);
            // the values are counts (ints)
            conf.setOutputValueClass(IntWritable.class);
            conf.setMapperClass(MyMapper.class);
            conf.setReducerClass(MyReducer.class);


            //set your input file path below
            FileInputFormat.setInputPaths(conf, "/home/hduser/Id3_hds/playtennis.txt");
            FileOutputFormat.setOutputPath(conf, new Path("/home/hduser/Id3_hds/1/output"+current_index));
            JobClient.runJob(conf);
            return 0;
          }

是的，这是可能的。您可以查看以下教程以了解链接是如何发生的

确保使用

fs.delete（intermediateoutputPath）删除每个MR阶段将创建的HDFS中的中间输出数据
看看它是如何工作的
你需要有两份工作。Job2依赖于job1
public class ChainJobs extends Configured implements Tool {

 private static final String OUTPUT_PATH = "intermediate_output";

 @Override
 public int run(String[] args) throws Exception {
  /*
   * Job 1
   */
  Configuration conf = getConf();
  FileSystem fs = FileSystem.get(conf);
  Job job = new Job(conf, "Job1");
  job.setJarByClass(ChainJobs.class);

  job.setMapperClass(MyMapper1.class);
  job.setReducerClass(MyReducer1.class);

  job.setOutputKeyClass(Text.class);
  job.setOutputValueClass(IntWritable.class);

  job.setInputFormatClass(TextInputFormat.class);
  job.setOutputFormatClass(TextOutputFormat.class);

  TextInputFormat.addInputPath(job, new Path(args[0]));
  TextOutputFormat.setOutputPath(job, new Path(OUTPUT_PATH));

  job.waitForCompletion(true); /*this goes to next command after this job is completed. your second job is dependent on your first job.*/


  /*
   * Job 2
   */
  Configuration conf2 = getConf();
  Job job2 = new Job(conf2, "Job 2");
  job2.setJarByClass(ChainJobs.class);

  job2.setMapperClass(MyMapper2.class);
  job2.setReducerClass(MyReducer2.class);

  job2.setOutputKeyClass(Text.class);
  job2.setOutputValueClass(Text.class);

  job2.setInputFormatClass(TextInputFormat.class);
  job2.setOutputFormatClass(TextOutputFormat.class);

  TextInputFormat.addInputPath(job2, new Path(OUTPUT_PATH));
  TextOutputFormat.setOutputPath(job2, new Path(args[1]));

  return job2.waitForCompletion(true) ? 0 : 1;
 }

 /**
  * Method Name: main Return type: none Purpose:Read the arguments from
  * command line and run the Job till completion
  * 
  */
 public static void main(String[] args) throws Exception {
  // TODO Auto-generated method stub
  if (args.length != 2) {
   System.err.println("Enter valid number of arguments <Inputdirectory>  <Outputlocation>");
   System.exit(0);
  }
  ToolRunner.run(new Configuration(), new ChainJobs(), args);
 }
}

公共类ChainJobs扩展配置的实现工具{
私有静态最终字符串输出\u PATH=“中间\u输出”；
@凌驾
公共int运行（字符串[]args）引发异常{
/*
*工作1
*/
配置conf=getConf（）；
FileSystem fs=FileSystem.get（conf）；
作业作业=新作业（配置，“作业1”）；
job.setJarByClass（ChainJobs.class）；
setMapperClass（MyMapper1.class）；
job.setReducerClass（MyReducer1.class）；
job.setOutputKeyClass（Text.class）；
job.setOutputValueClass（IntWritable.class）；
setInputFormatClass（TextInputFormat.class）；
setOutputFormatClass（TextOutputFormat.class）；
addInputPath（作业，新路径（args[0]）；
setOutputPath（作业，新路径（输出路径））；
job.waitForCompletion（true）；/*此命令在此作业完成后转到下一个命令。您的第二个作业取决于您的第一个作业*/
/*
*工作2
*/
配置conf2=getConf（）；
Job job2=新的Job（定义2，“Job 2”）；
job2.setJarByClass（ChainJobs.class）；
job2.setMapperClass（MyMapper2.class）；
job2.setReducerClass（MyReducer2.class）；
job2.setOutputKeyClass（Text.class）；
job2.setOutputValueClass（Text.class）；
job2.setInputFormatClass（TextInputFormat.class）；
job2.setOutputFormatClass（TextOutputFormat.class）；
addInputPath（作业2，新路径（输出路径））；
setOutputPath（作业2，新路径（args[1]）；
返回作业2.等待完成（真）？0:1；
}
/**
*方法名称：主返回类型：无目的：从中读取参数
*命令行并运行作业直到完成
* 
*/
公共静态void main（字符串[]args）引发异常{
//TODO自动生成的方法存根
如果（参数长度！=2）{
System.err.println（“输入有效的参数数”）；
系统出口（0）；
}
运行（新配置（），新链接作业（），参数）；
}
}
我是否应该包括fs.delete（intermediateoutputPath）；在JobClient.runJob（conf）之后？因此，我正在启动一个新作业。？@alekya reddy在末尾将其放入main方法，以便在末尾删除冗余数据。如果你不删除也很好。但是删除不需要的数据总是很好的做法好的。谢谢。这真的很有帮助。有可能有两个映射器和一个还原器。执行顺序应该是mapper->reducer。在完成上述作业后，下一个映射器应该执行。因为我将第一个作业的输出作为下一个映射器的输入。是的，你可以。将MR第一阶段的减速器数量设置为零。它只作为地图作业工作。