Java 在Hadoop中,如果希望将每个键值对中的值保留在一个数组中,为什么添加的所有元素都相同?
我试图存储Map函数获得的键值对中的值,并进一步使用它们。考虑到以下输入:Java 在Hadoop中,如果希望将每个键值对中的值保留在一个数组中,为什么添加的所有元素都相同?,java,arraylist,hadoop,mapreduce,Java,Arraylist,Hadoop,Mapreduce,我试图存储Map函数获得的键值对中的值,并进一步使用它们。考虑到以下输入: Hello hadoop goodbye hadoop Hello world goodbye world Hello thinker goodbye thinker 请输入以下代码: 注意-地图是简单的字数示例 public class Inception extends Configured implements Tool{ public Path workingPath; public static cla
Hello hadoop goodbye hadoop
Hello world goodbye world
Hello thinker goodbye thinker
请输入以下代码:
注意-地图是简单的字数示例
public class Inception extends Configured implements Tool{
public Path workingPath;
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
// initialising the arrays that contain the values and the keys
public ArrayList<LongWritable> keyBuff = new ArrayList<LongWritable>();
public ArrayList<Text> valueBuff = new ArrayList<Text>();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
System.out.println(word + " / " + one);
}
}
public void innerMap(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
// adding the value to the bufferr
valueBuff.add(value);
System.out.println("ArrayList addValue -> " + value);
for (Text v : valueBuff){
System.out.println("ArrayList containedValue -> " + value);
}
keyBuff.add(key);
}
public void run(Context context) throws IOException, InterruptedException {
setup(context);
// going over the key-value pairs and storing them into the arrays
while(context.nextKeyValue()){
innerMap(context.getCurrentKey(), context.getCurrentValue(), context);
}
Iterator itrv = valueBuff.iterator();
Iterator itrk = keyBuff.iterator();
while(itrv.hasNext()){
LongWritable nextk = (LongWritable) itrk.next();
Text nextv = (Text) itrv.next();
System.out.println("Value iterator -> " + nextv);
System.out.println("Key iterator -> " + nextk);
// iterating over the values and running the map on them.
map(nextk, nextv, context);
}
cleanup(context);
}
}
public int run(String[] args) throws Exception { ... }
public static void main (..) { ... }
因此,您可以注意到,每次我向ArrayList valueBuff添加新值时,列表中的所有值都会被覆盖。有人知道为什么会这样吗?为什么没有在数组中正确添加值?使用。调用上下文#nextKeyValue时,将调用LineRecordReader#nextKeyValue
在LineRecordReader中,每次调用nextKeyValue方法时都使用相同的键和值对象,只更改它们的内容。如果要保存键和值数据,则必须在用户代码中创建对象的副本
这对于优化是有意义的,如果为每个记录创建一个新的键和值对象,那么系统将很容易出错。代码根本不可读,至少您可以删除死代码:(更新了代码。删除了所有内容,除了地图和我想做的事情。很抱歉,你说得对,我不应该发布所有内容。
ArrayList addValue -> Hello hadoop goodbye hadoop
ArrayList containedValue -> Hello hadoop goodbye hadoop
ArrayList addValue -> Hello world goodbye world
ArrayList containedValue -> Hello world goodbye world
ArrayList containedValue -> Hello world goodbye world
ArrayList addValue -> Hello thinker goodbye thinker
ArrayList containedValue -> Hello thinker goodbye thinker
ArrayList containedValue -> Hello thinker goodbye thinker
ArrayList containedValue -> Hello thinker goodbye thinker
Value iterator -> Hello thinker goodbye thinker
Key iterator -> 84
Hello / 1
thinker / 1
goodbye / 1
thinker / 1
Value iterator -> Hello thinker goodbye thinker
Key iterator -> 84
Hello / 1
thinker / 1
goodbye / 1
thinker / 1
Value iterator -> Hello thinker goodbye thinker
Key iterator -> 84
Hello / 1
thinker / 1
goodbye / 1
thinker / 1