Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/343.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/hadoop/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java Hadoop多输出_Java_Hadoop - Fatal编程技术网

Java Hadoop多输出

Java Hadoop多输出,java,hadoop,Java,Hadoop,我编写了一些hadoop代码来读取映射文件,并将其拆分为块,然后将其写入多个文件,如下所示: public void map(LongWritable key, Text value, OutputCollector<IntWritable, Text> output,Reporter reporter) throws IOException { String line = value.toString(); int totalLines = 2000; int lines

我编写了一些hadoop代码来读取映射文件,并将其拆分为块,然后将其写入多个文件,如下所示:

public void map(LongWritable key, Text value, OutputCollector<IntWritable, Text> 
output,Reporter reporter) throws IOException {
String line = value.toString();
    int totalLines = 2000;
int lines = 0;
    int fileNum = 1;
String[] linesinfile = line.split("\n");
    while(lines<linesinfile.length) {
        // I do something like, if lines = totalLines, {
        output.collect(new IntWritable(fileNum), new    
            Text(linesinfile[lines].toString()));
        fileNum++;
        lines = 0;
        }
    lines++;
   }
}
public class MultiFileOutput extends MultipleTextOutputFormat<IntWritable, Text> {

protected String generateFileNameForKeyValue(IntWritable key, Text content, String 
            fileName) {
    return key.toString() + "-" + fileName;
}
}
除了设置输出键/值类别等

我做错了什么?我的输出目录总是空的


谢谢,这个程序看起来有点复杂。如果目的是将文件拆分为多个文件,那么可以通过两种方式完成。不需要一个Map和Reduce作业,只要一个Map作业就足够了

  • 使用o.a.h.mapred.lib.NLineInputFormat从输入中一次读取N行到映射器,然后将这些N行写入文件

  • 上载文件时,将dfs.blocksize设置为所需的文件大小,然后每个映射程序将处理一个可写入文件的InputSplit


谢谢。事实上,我一直误以为hadoop只设置了与输入文件数量相同的映射器!(我刚开始使用hadoop。)现在我已经将Nummaptask设置为5000。
public class MultiFileOutput extends MultipleTextOutputFormat<IntWritable, Text> {

protected String generateFileNameForKeyValue(IntWritable key, Text content, String 
            fileName) {
    return key.toString() + "-" + fileName;
}
}
    conf.setInputFormat(TextInputFormat.class);
    conf.setOutputFormat(MultiFileOutput.class);