Hadoop 在一次通话中完成所有记录'；减少'；一旦_Hadoop_Reduce

Hadoop 在一次通话中完成所有记录'；减少'；一旦

hadoop

Hadoop 在一次通话中完成所有记录'；减少'；一旦,hadoop,reduce,Hadoop,Reduce,我在hadoop中做了一个练习，用于对对象“IntPair”进行排序，它是2个整数的组合。以下是输入文件： 2,9 3,8 2,6 3,2 ... 类“IntPair”如下所示： static class IntPair implements WritableComparable<IntPair> { private int first; private int second; ... public int compareTo(IntPai

我在hadoop中做了一个练习，用于对对象“IntPair”进行排序，它是2个整数的组合。以下是输入文件：

2,9
3,8
2,6
3,2
...

类“IntPair”如下所示：

static class IntPair implements WritableComparable<IntPair> {
    private int first;
    private int second;   
       ...
   public int compareTo(IntPair o) {
       return (this.first==o.first)?(this.second==o.second?0:(this.second>o.second?1:-1)):(this.first>o.first?1:-1);
    }
   public static int compare(int a, int b) {
   return (a==b)?0:((a>b)?1:-1);
   }
       ...  
}

我基于第一个整数对映射器结果进行分区，并基于第一个整数创建组比较器。只有排序比较器基于两个整数

static class FirstPartitioner extends Partitioner<IntPair, NullWritable> {

    public int getPartition(IntPair key, NullWritable value, int numPartitions) {
            return Math.abs(key.getFirst()*127)%numPartitions;
        }
}
static class BothComparator extends WritableComparator {
    public int compare(WritableComparable w1, WritableComparable w2) {
            IntPair p1 = (IntPair)w1;
            IntPair p2 = (IntPair)w2;
            int cmp = IntPair.compare(p1.getFirst(), p2.getFirst());
            if(cmp != 0) {
                return cmp;
            }
            return -IntPair.compare(p1.getSecond(), p2.getSecond());//reverse sort
    }

}

static class FirstGroupComparator extends WritableComparator {
    public int compare(WritableComparable w1, WritableComparable w2) {
            IntPair p1 = (IntPair)w1;
            IntPair p2 = (IntPair)w2;
            return IntPair.compare(p1.getFirst(), p2.getFirst());
    }
}

早些时候，我曾认为reducer应该按键（IntPair）对记录进行分组。由于每条记录代表一个不同的键，因此每条记录将调用方法“reduce”一次，在这种情况下，结果应为：

2,9
2,6
3,8
3,2

所以我认为这种差异的存在是因为组比较器，因为它只使用第一个整数进行比较。因此，在reducer中，记录按第一个整数分组。在本例中，这意味着两条记录中的每一条都调用“reduce”一次，因此在不循环的情况下，它只为每个组生成第一条记录。是这样吗？另外，我做了另一个实验，改变了减速器，如下所示：

static class SSReducer extends Reducer<IntPair, NullWritable, IntPair, NullWritable> {
     protected void reduce(IntPair key, Iterable<NullWritable> values,
                Context context)throws IOException, InterruptedException {
                        for(NullWritable n : values) //add looping
                   context.write(key, NullWritable.get());
            }
    }

静态类SSReducer扩展减速器{
受保护的void reduce（整数对密钥、可替换值、，
上下文）抛出IOException、InterruptedException{
for（NullWritable n:values）//添加循环
write（key，nullwriteable.get（））；
}
}

然后，它生成包含4项的结果

如果我将groupcomparator更改为使用两个整数进行比较，它还将生成4项。

因此，reducer实际上使用groupcomparator对键进行分组，这意味着即使键不同，一个组调用中的每个记录也会“reduce”一次。

您的理解是正确的。键的“复合值”对进入减速器的分组没有影响。是比较器的特定行为和它们查看的特定字段造成了差异。

是的，一个组调用中的每个记录“reduce”一次，即使键不同。实际上，每个组调用reduce方法一次，组中的第一个键为“key”，组中的所有值构成reduce方法的值

即使我们在reduce方法中只有一个键（第一个键），并且所有值都是iterable，但您可以看到，在迭代时，我们将得到iterable中值的对应键

首先，我们使用两个键访问groupcomparator，reduce方法启动，并从迭代器内部再次使用另外两个键调用group comperator

这意味着减速器事先不知道它的iterable值。它是在迭代iterable值时确定的

因此，如果我们不迭代这些值，我们将只看到组的第一个键。如果我们迭代这些值，我们将得到所有键

   2,9
   3,8

2,9
2,6
3,8
3,2

static class SSReducer extends Reducer<IntPair, NullWritable, IntPair, NullWritable> {
     protected void reduce(IntPair key, Iterable<NullWritable> values,
                Context context)throws IOException, InterruptedException {
                        for(NullWritable n : values) //add looping
                   context.write(key, NullWritable.get());
            }
    }