Java 在Hadoop中将精简的数据拆分为输出和新输入_Java_Hadoop_Split_Mapreduce

Java 在Hadoop中将精简的数据拆分为输出和新输入

java hadoop mapreduce

Java 在Hadoop中将精简的数据拆分为输出和新输入,java,hadoop,split,mapreduce,Java,Hadoop,Split,Mapreduce,我已经四处寻找了几天，试图找到一种在hadoop中使用精简数据进行进一步映射的方法。我将类A的对象作为输入数据，将类B的对象作为输出数据。问题是，映射时不仅生成Bs，还生成新的As 以下是我想要实现的目标： 1.1 input: a list of As 1.2 map result: for each A a list of new As and a list of Bs is generated 1.3 reduce: filtered Bs are saved as output, fil

我已经四处寻找了几天，试图找到一种在hadoop中使用精简数据进行进一步映射的方法。我将类

的对象作为输入数据，将类

的对象作为输出数据。问题是，映射时不仅生成

s，还生成新的

以下是我想要实现的目标：

1.1 input: a list of As
1.2 map result: for each A a list of new As and a list of Bs is generated
1.3 reduce: filtered Bs are saved as output, filtered As are added to the map jobs

2.1 input: a list of As produced by the first map/reduce
2.2 map result: for each A a list of new As and a list of Bs is generated
2.3 ...

3.1 ...

你应该了解基本的想法

我读过很多关于链接的书，但我不知道如何将ChainReducer和ChainMapper结合起来，甚至不知道这是否是正确的方法

因此，我的问题是：如何分割映射的数据，同时缩小以将一部分保存为输出，另一部分保存为新的输入数据。

尝试使用。正如Javadoc所建议的：

MultipleOutputs类简化了将输出数据写入多个输出

案例一：写入作业默认值以外的其他输出输出。可以配置每个附加输出或命名输出具有自己的OutputFormat、自己的密钥类和自己的价值等级

案例二：将数据写入用户提供的不同文件

作业提交的使用模式：

Job job = new Job();

 FileInputFormat.setInputPath(job, inDir);
 FileOutputFormat.setOutputPath(job, outDir);

 job.setMapperClass(MOMap.class);
 job.setReducerClass(MOReduce.class);
 ...

 // Defines additional single text based output 'text' for the job
 MultipleOutputs.addNamedOutput(job, "text", TextOutputFormat.class,
 LongWritable.class, Text.class);

 // Defines additional sequence-file based output 'sequence' for the job
 MultipleOutputs.addNamedOutput(job, "seq",
   SequenceFileOutputFormat.class,
   LongWritable.class, Text.class);
 ...

 job.waitForCompletion(true);
 ...

 String generateFileName(K k, V v) {
   return k.toString() + "_" + v.toString();
 }

 public class MOReduce extends
   Reducer<WritableComparable, Writable,WritableComparable, Writable> {
 private MultipleOutputs mos;
 public void setup(Context context) {
 ...
 mos = new MultipleOutputs(context);
 }

 public void reduce(WritableComparable key, Iterator<Writable> values,
 Context context)
 throws IOException {
 ...
 mos.write("text", , key, new Text("Hello"));
 mos.write("seq", LongWritable(1), new Text("Bye"), "seq_a");
 mos.write("seq", LongWritable(2), key, new Text("Chau"), "seq_b");
 mos.write(key, new Text("value"), generateFileName(key, new Text("value")));
 ...
 }

 public void cleanup(Context) throws IOException {
 mos.close();
 ...
 }

 }

在减速器中的用法：

Job job = new Job();

 FileInputFormat.setInputPath(job, inDir);
 FileOutputFormat.setOutputPath(job, outDir);

 job.setMapperClass(MOMap.class);
 job.setReducerClass(MOReduce.class);
 ...

 // Defines additional single text based output 'text' for the job
 MultipleOutputs.addNamedOutput(job, "text", TextOutputFormat.class,
 LongWritable.class, Text.class);

 // Defines additional sequence-file based output 'sequence' for the job
 MultipleOutputs.addNamedOutput(job, "seq",
   SequenceFileOutputFormat.class,
   LongWritable.class, Text.class);
 ...

 job.waitForCompletion(true);
 ...

 String generateFileName(K k, V v) {
   return k.toString() + "_" + v.toString();
 }

 public class MOReduce extends
   Reducer<WritableComparable, Writable,WritableComparable, Writable> {
 private MultipleOutputs mos;
 public void setup(Context context) {
 ...
 mos = new MultipleOutputs(context);
 }

 public void reduce(WritableComparable key, Iterator<Writable> values,
 Context context)
 throws IOException {
 ...
 mos.write("text", , key, new Text("Hello"));
 mos.write("seq", LongWritable(1), new Text("Bye"), "seq_a");
 mos.write("seq", LongWritable(2), key, new Text("Chau"), "seq_b");
 mos.write(key, new Text("value"), generateFileName(key, new Text("value")));
 ...
 }

 public void cleanup(Context) throws IOException {
 mos.close();
 ...
 }

 }

String generateFileName（K，V）{
返回k.toString（）+“”+v.toString（）；
}
公共类MOReduce扩展
减速器{
私人多路输出mos；
公共无效设置（上下文）{
...
mos=新的多输出（上下文）；
}
public void reduce（可写可比键、迭代器值、，
上下文（上下文）
抛出IOException{
...
写（“文本”，键，新文本（“你好”）；
最新的文字（“seq”，LongWritable（1），新文本（“Bye”），“seq_a”）；
mos.write（“seq”，LongWritable（2），key，new Text（“Chau”），“seq_b”）；
mos.write（键，新文本（“值”）），generateFileName（键，新文本（“值”））；
...
}
公共无效清除（上下文）引发IOException{
mos.close（）；
...
}
}

请注意，这些代码示例适用于Hadoop 0.*而不是1.0.4。当我使用1.0.4时，界面略有变化。但基本的想法是我一直在寻找的。非常感谢。没错。这是0.20美元