Java Hadoop多输出_Java_Hadoop - Fatal编程技术网

Java Hadoop多输出

java hadoop

Java Hadoop多输出,java,hadoop,Java,Hadoop,我编写了一些hadoop代码来读取映射文件，并将其拆分为块，然后将其写入多个文件，如下所示： public void map(LongWritable key, Text value, OutputCollector<IntWritable, Text> output,Reporter reporter) throws IOException { String line = value.toString(); int totalLines = 2000; int lines

我编写了一些hadoop代码来读取映射文件，并将其拆分为块，然后将其写入多个文件，如下所示：

public void map(LongWritable key, Text value, OutputCollector<IntWritable, Text> 
output,Reporter reporter) throws IOException {
String line = value.toString();
    int totalLines = 2000;
int lines = 0;
    int fileNum = 1;
String[] linesinfile = line.split("\n");
    while(lines<linesinfile.length) {
        // I do something like, if lines = totalLines, {
        output.collect(new IntWritable(fileNum), new    
            Text(linesinfile[lines].toString()));
        fileNum++;
        lines = 0;
        }
    lines++;
   }
}

public class MultiFileOutput extends MultipleTextOutputFormat<IntWritable, Text> {

protected String generateFileNameForKeyValue(IntWritable key, Text content, String 
            fileName) {
    return key.toString() + "-" + fileName;
}
}

除了设置输出键/值类别等

我做错了什么？我的输出目录总是空的

谢谢，这个程序看起来有点复杂。如果目的是将文件拆分为多个文件，那么可以通过两种方式完成。不需要一个Map和Reduce作业，只要一个Map作业就足够了

使用o.a.h.mapred.lib.NLineInputFormat从输入中一次读取N行到映射器，然后将这些N行写入文件
上载文件时，将dfs.blocksize设置为所需的文件大小，然后每个映射程序将处理一个可写入文件的InputSplit

谢谢。事实上，我一直误以为hadoop只设置了与输入文件数量相同的映射器！（我刚开始使用hadoop。）现在我已经将Nummaptask设置为5000。

public class MultiFileOutput extends MultipleTextOutputFormat<IntWritable, Text> {

protected String generateFileNameForKeyValue(IntWritable key, Text content, String 
            fileName) {
    return key.toString() + "-" + fileName;
}
}

    conf.setInputFormat(TextInputFormat.class);
    conf.setOutputFormat(MultiFileOutput.class);