如果我们将映射器和合并器保留在Mapreduce中而跳过reducer，会发生什么_Mapreduce

如果我们将映射器和合并器保留在Mapreduce中而跳过reducer，会发生什么

mapreduce

如果我们将映射器和合并器保留在Mapreduce中而跳过reducer，会发生什么,mapreduce,Mapreduce,大小为10 GB的输入文件位于 /user/cloudera/inputfiles/records.txt 这是我的驾驶员等级代码： public class WordCountMain { /** * @param args */ public static void main(String[] args) throws Exception { // TODO Auto-generated method stub Configuration conf = new Co

大小为10 GB的输入文件位于

/user/cloudera/inputfiles/records.txt

这是我的驾驶员等级代码：

public class WordCountMain {

/**
 * @param args
 */
public static void main(String[] args) throws Exception {
    // TODO Auto-generated method stub

    Configuration conf = new Configuration();

    Path inputFilePath = new Path(args[0]);
    Path outputFilePath = new Path(args[1]);




Job job = new Job(conf,"word count");
job.getConfiguration().set("mapred.job.queue.name","omega");

    job.setJarByClass(WordCountMain.class);



    FileInputFormat.addInputPath(job, inputFilePath);
    FileOutputFormat.setOutputPath(job, outputFilePath);

    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

    job.setMapperClass(WordCountMapper.class);
    job.setCombinerClass(WordCountCombiner.class);
    job.setNumReduceTasks(0);
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(IntWritable.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    System.exit(job.waitForCompletion(true) ? 0 : 1);

}

}

我有映射器和组合器的代码，我已经将reducer设置为零

这是我的映射程序代码：

public class WordCountMapper extends Mapper<Object,Text,Text,IntWritable>
{
public static IntWritable one = new IntWritable(1);

    protected void map(Object key, Text value, Context context) throws java.io.IOException,java.lang.InterruptedException
    {

    String line =   value.toString();
    String eachWord =null;
    StringTokenizer st = new StringTokenizer(line,"|");

    while(st.hasMoreTokens())
    {
        eachWord = st.nextToken();
        context.write(new Text(eachWord), one);
    }


    }
}

公共类WordCountMapper扩展了映射器
{
public static IntWritable one=新的IntWritable（1）；
受保护的void映射（对象键、文本值、上下文上下文）抛出java.io.IOException、java.lang.InterruptedException
{
字符串行=value.toString（）；
字符串eachWord=null；
StringTokenizer st=新的StringTokenizer（行“|”）；
而（st.hasMoreTokens（））
{
eachWord=st.nextToken（）；
编写（新文本（每个单词），一个）；
}
}
}

我已经写了我自己的组合器

这是我的组合器代码：

public class WordCountCombiner extends Reducer<Text ,IntWritable,Text,IntWritable> {


protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws java.io.IOException, java.lang.InterruptedException
{
    int count =0;
    for(IntWritable i : values)
    {
        count =count+i.get();
    }
    context.write(key, new IntWritable(count));
}

}

公共类WordCountCombiner扩展了Reducer{
受保护的void reduce（文本键、Iterable值、上下文上下文）抛出java.io.IOException、java.lang.InterruptedException
{
整数计数=0；
for（可写i：值）
{
count=count+i.get（）；
}
write（key，newintwriteable（count））；
}
}

我这里的问题是它将存储什么输出

映射器的输出还是合并器的输出

或者，只有在写入减速器相位时，合路器才会执行

请帮助

您无法确定合路器功能将运行多少次或是否会运行。此外，运行合并器并不取决于您是否为作业指定了减速机。在您的情况下，它只需生成160个输出文件（10240/64=160）

通过跳过mapper和reducer的设置，hadoop将继续使用其默认映射。例如，它将使用

IdentityMapper.class作为默认映射器

默认输入格式为TextInputFormat

默认的分区器是HashPartitione

默认情况下，只有一个减速器，因此只有一个分区

默认的reducer是reducer，也是泛型类型

默认的输出格式是TextOutputFormat，它通过将键和值转换为字符串并用制表符分隔来写出记录，每行一条

我正在单独获取映射器输出。在映射器代码中，刚创建的映射键和常量new intwriteable（1）作为mapoutputvalue。在组合器代码中，我包含了用于添加的逻辑。我仅获取映射器输出。您好1 How 1是这样的。另请查看控制台日志Job_201502260931_72501已完成成功启动的映射任务=500个数据本地映射任务=500个还原程序的总执行时间（毫秒）=0 FileSystemCounters MAPRFS_BYTES_READ=4213018503 MAPRFS_BYTES_writed=5025563869 FILE_BYTES_writed=25879174 Map Reduce Framework Map input records=28997989物理_MEMORY_BYTES=282704257024 CPU_毫秒=753350虚拟_MEMORY_BYTES=1571293040640映射输出记录=405971846分割_RAW_BYTES=66500 GC运行时间（ms）=1402是，因为您指定了job.setNumReduceTasks（0），所以在此场景中不会有任何缩减器。即使您为合并器指定了减速机逻辑，我们也不能说合并器是否会运行。因此，您的组合器可能根本无法运行。如果您想知道如何决定combiner是否运行，请阅读更多关于hadoop中光盘溢出的信息。