Java 在MapReduce中将列表用作值会返回相同的值_Java_List_Hadoop_Mapreduce_Reduce

Java 在MapReduce中将列表用作值会返回相同的值

java list hadoop mapreduce

Java 在MapReduce中将列表用作值会返回相同的值,java,list,hadoop,mapreduce,reduce,Java,List,Hadoop,Mapreduce,Reduce,我有一个MapReduce作业，它输出一个IntWritable作为键，并将Point（我创建的实现可写的对象）对象作为map函数的值。然后在reduce函数中，我使用for each循环遍历点的iterable来创建一个列表： @Override public void reduce(IntWritable key, Iterable<Point> points, Context context) throws IOException, InterruptedException {

我有一个MapReduce作业，它输出一个IntWritable作为键，并将Point（我创建的实现可写的对象）对象作为map函数的值。然后在reduce函数中，我使用for each循环遍历点的iterable来创建一个列表：

@Override
public void reduce(IntWritable key, Iterable<Point> points, Context context) throws IOException, InterruptedException {

    List<Point> pointList = new ArrayList<>();
    for (Point point : points) {
        pointList.add(point);
    }
    context.write(key, pointList);
}

积分等级：

public class Point implements Writable {

public Double att1;
public Double att2;
public Double att3;
public Double att4;

public Point() {

}

public void set(Double att1, Double att2, Double att3, Double att4) {
    this.att1 = att1;
    this.att2 = att2;
    this.att3 = att3;
    this.att4 = att4;
}

@Override
public void write(DataOutput dataOutput) throws IOException {
    dataOutput.writeDouble(att1);
    dataOutput.writeDouble(att2);
    dataOutput.writeDouble(att3);
    dataOutput.writeDouble(att4);
}

@Override
public void readFields(DataInput dataInput) throws IOException {
    this.att1 = dataInput.readDouble();
    this.att2 = dataInput.readDouble();
    this.att3 = dataInput.readDouble();
    this.att4 = dataInput.readDouble();
}

@Override
public String toString() {
    String output = "{" + att1 + ", " + att2 + ", " + att3 + ", " + att4 + "}";
    return output;
}

问题出在你的减速机上。您不想在内存中存储所有点。它们可能很大，Hadoop为您解决了这一问题（即使是以一种尴尬的方式）

在给定的

Iterable

中循环时，每个

点

实例都会被重复使用，因此在给定的时间内只保留一个实例

这意味着当您调用points.next（）时，将发生以下两种情况：

点

实例被重新使用，并与下一个点数据一起设置

这同样适用于

键

实例

在本例中，您将在列表中找到多次插入的

点的一个实例，并使用最后一个点的数据进行设置
您不应该将可写文件的实例保存在reducer中，也不应该克隆它们
您可以在此处阅读有关此问题的更多信息
请添加map和reduce的代码，以及在map中设置和在reduce中检索的方式。另外，实现Writable的point类刚刚用point和Mapper类更新了post。以上所有代码都是每个类中的所有代码；在地图内部，并获取上下文。写（一，点）；在while循环之外。问题是，当我应用它时，我想将每个点与iterable中的其他点进行比较，所以我需要能够存储它们并返回它们。有没有办法做到这一点？你不想把它们存储在内存中。正如我所说的，MapReduce是一个大数据处理工具——值可能无法放入内存。把这个点当作钥匙怎么样？然后，您将在reducer中对相同的点进行分组和排序。
public class Point implements Writable {

public Double att1;
public Double att2;
public Double att3;
public Double att4;

public Point() {

}

public void set(Double att1, Double att2, Double att3, Double att4) {
    this.att1 = att1;
    this.att2 = att2;
    this.att3 = att3;
    this.att4 = att4;
}

@Override
public void write(DataOutput dataOutput) throws IOException {
    dataOutput.writeDouble(att1);
    dataOutput.writeDouble(att2);
    dataOutput.writeDouble(att3);
    dataOutput.writeDouble(att4);
}

@Override
public void readFields(DataInput dataInput) throws IOException {
    this.att1 = dataInput.readDouble();
    this.att2 = dataInput.readDouble();
    this.att3 = dataInput.readDouble();
    this.att4 = dataInput.readDouble();
}

@Override
public String toString() {
    String output = "{" + att1 + ", " + att2 + ", " + att3 + ", " + att4 + "}";
    return output;
}