Java Hadoop多输出
我编写了一些hadoop代码来读取映射文件,并将其拆分为块,然后将其写入多个文件,如下所示:Java Hadoop多输出,java,hadoop,Java,Hadoop,我编写了一些hadoop代码来读取映射文件,并将其拆分为块,然后将其写入多个文件,如下所示: public void map(LongWritable key, Text value, OutputCollector<IntWritable, Text> output,Reporter reporter) throws IOException { String line = value.toString(); int totalLines = 2000; int lines
public void map(LongWritable key, Text value, OutputCollector<IntWritable, Text>
output,Reporter reporter) throws IOException {
String line = value.toString();
int totalLines = 2000;
int lines = 0;
int fileNum = 1;
String[] linesinfile = line.split("\n");
while(lines<linesinfile.length) {
// I do something like, if lines = totalLines, {
output.collect(new IntWritable(fileNum), new
Text(linesinfile[lines].toString()));
fileNum++;
lines = 0;
}
lines++;
}
}
public class MultiFileOutput extends MultipleTextOutputFormat<IntWritable, Text> {
protected String generateFileNameForKeyValue(IntWritable key, Text content, String
fileName) {
return key.toString() + "-" + fileName;
}
}
除了设置输出键/值类别等
我做错了什么?我的输出目录总是空的
谢谢,这个程序看起来有点复杂。如果目的是将文件拆分为多个文件,那么可以通过两种方式完成。不需要一个Map和Reduce作业,只要一个Map作业就足够了
- 使用o.a.h.mapred.lib.NLineInputFormat从输入中一次读取N行到映射器,然后将这些N行写入文件
- 上载文件时,将dfs.blocksize设置为所需的文件大小,然后每个映射程序将处理一个可写入文件的InputSplit
public class MultiFileOutput extends MultipleTextOutputFormat<IntWritable, Text> {
protected String generateFileNameForKeyValue(IntWritable key, Text content, String
fileName) {
return key.toString() + "-" + fileName;
}
}
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(MultiFileOutput.class);