Hadoop 如何保存Mapreduce'；没有键、值对的s减速机输出？_Hadoop_Mapreduce_Hdfs

Hadoop 如何保存Mapreduce'；没有键、值对的s减速机输出？

hadoop mapreduce

Hadoop 如何保存Mapreduce'；没有键、值对的s减速机输出？,hadoop,mapreduce,hdfs,Hadoop,Mapreduce,Hdfs,我正在写一个Mapreduce程序来处理Dicom图像。这个Mapreduce程序的目的是处理dicom图像，从中提取元数据，索引到solr，最后在Reducer阶段，它应该将原始图像保存在hdfs中。我想在HDFS中保存相同的文件作为减速机输出所以我已经实现了大部分功能，但在reducer阶段，在hdfs中存储同一个文件时，它不起作用我已经用Dicom图像查看器测试了处理过的Dicom文件，它显示该文件已被选中，并且处理过的Dicom文件的大小略有增加Ex.原始Dicom大小为628K

我正在写一个Mapreduce程序来处理Dicom图像。这个Mapreduce程序的目的是处理dicom图像，从中提取元数据，索引到solr，最后在Reducer阶段，它应该将原始图像保存在hdfs中。我想在HDFS中保存相同的文件作为减速机输出

所以我已经实现了大部分功能，但在reducer阶段，在hdfs中存储同一个文件时，它不起作用

我已经用Dicom图像查看器测试了处理过的Dicom文件，它显示该文件已被选中，并且处理过的Dicom文件的大小略有增加Ex.原始Dicom大小为628Kb，当reducer将此文件保存在hdfs中时，其大小将更改为630Kb

我尝试过从这些链接中找到解决方案，但没有一个给出预期的结果

以下是将Dicom文件作为单个文件读取（不拆分）的代码

所以我完全不知道该怎么办。有些链接说这是不可能的，因为Mapreduce在pair上工作，有些链接说使用NullWritable。到目前为止，我已经尝试过NullWritable、SequenceFileOutputFormat，但都不起作用

有两件事：

通过调用

itr.next（）

两次，您无意中在reducer中一次消耗了两个元素，这是无济于事的

正如您所确定的，您正在编写一个键和一个值，而您只想编写一个。改为使用

nullwriteable

作为值。您的减速器将如下所示：

public static class Reduce extends Reducer<Text, BytesWritable, BytesWritable, NullWritable>{
    @Override
    protected void reduce(Text key, Iterable<BytesWritable> value,
                          Reducer<Text, BytesWritable, BytesWritable, NullWritable>.Context context)
            throws IOException, InterruptedException {
        NullWritable nullWritable = NullWritable.get();
        Iterator<BytesWritable> itr = value.iterator();
        while(itr.hasNext())
        {
            BytesWritable wr = itr.next();
            wr.setCapacity(wr.getLength());
            context.write(wr, nullWritable);
        }
    }
}

公共静态类Reduce扩展Reducer{
@凌驾
受保护的void reduce（文本键、Iterable值、，
（上下文）
抛出IOException、InterruptedException{
nullwriteable nullwriteable=nullwriteable.get（）；
迭代器itr=value.Iterator（）；
while（itr.hasNext（））
{
BytesWritable wr=itr.next（）；
wr.setCapacity（wr.getLength（））；
write（wr，nullWritable）；
}
}
}

另外，是否需要调用

setCapacity（）

？嘿，本，谢谢你的帮助，我忘了评论itr.next（），我也试过了，但没用。无论如何，我找到了解决办法。我已经创建了一个自定义RecordWriter和自定义fileoutput格式，它可以工作，但我仍然不知道这是正确的方法。我将很快发布答案，请在您有空时查看。

public class WholeFileRecordReader extends RecordReader<NullWritable, BytesWritable>{

    private FileSplit fileSplit;
    private Configuration conf;
    private BytesWritable value = new BytesWritable();
    private boolean processed = false;

    @Override
    public void initialize(InputSplit split, TaskAttemptContext context) throws IOException, InterruptedException {     
        this.fileSplit = (FileSplit) split;
        this.conf = context.getConfiguration();     
    }

    @Override
    public boolean nextKeyValue() throws IOException, InterruptedException {
        if (!processed) {
            byte[] contents = new byte[(int) fileSplit.getLength()];
            System.out.println("Inside nextKeyvalue");
            System.out.println(fileSplit.getLength());
            Path file = fileSplit.getPath();
            FileSystem fs = file.getFileSystem(conf);
            FSDataInputStream in = null;
            try {
                in = fs.open(file);
                IOUtils.readFully(in, contents, 0, contents.length);
                value.set(contents, 0, contents.length);
            } finally {
                IOUtils.closeStream(in);
            }
                processed = true;
                return true;
            }
            return false;
    }

    @Override
    public void close() throws IOException {

    }

    @Override
    public NullWritable getCurrentKey() throws IOException, InterruptedException 
    {
        return NullWritable.get();
    }

    @Override
    public BytesWritable getCurrentValue() throws IOException, InterruptedException {
        return value;
    }

    @Override
    public float getProgress() throws IOException, InterruptedException {
        return processed ? 1.0f : 0.0f;
    }

}

public class MapClass{

    public static class Map extends Mapper<NullWritable, BytesWritable, Text, BytesWritable>{   

        @Override
        protected void map(NullWritable key, BytesWritable value,
                Mapper<NullWritable, BytesWritable, Text, BytesWritable>.Context context)
                throws IOException, InterruptedException {
            value.setCapacity(value.getLength());
            InputStream in = new ByteArrayInputStream(value.getBytes());            
            ProcessDicom.metadata(in); // Process dicom image and extract metadata from it
            Text keyOut = getFileName(context);
            context.write(keyOut, value);

        }

        private Text getFileName(Mapper<NullWritable, BytesWritable, Text, BytesWritable>.Context context)
        {
            InputSplit spl = context.getInputSplit();
            Path filePath = ((FileSplit)spl).getPath();
            String fileName = filePath.getName();
            Text text = new Text(fileName);
            return text;
        }

        @Override
        protected void setup(Mapper<NullWritable, BytesWritable, Text, BytesWritable>.Context context)
                throws IOException, InterruptedException {
            super.setup(context);
        }

    }

    public static class Reduce extends Reducer<Text, BytesWritable, BytesWritable, BytesWritable>{

        @Override
            protected void reduce(Text key, Iterable<BytesWritable> value,
                    Reducer<Text, BytesWritable, BytesWritable, BytesWritable>.Context context)
                    throws IOException, InterruptedException {

            Iterator<BytesWritable> itr = value.iterator();
            while(itr.hasNext())
            {
                BytesWritable wr = itr.next();
                wr.setCapacity(wr.getLength());
                context.write(new BytesWritable(key.copyBytes()), itr.next());
            }
        }
}

public class DicomIndexer{

    public static void main(String[] argss) throws Exception{
        String args[] = {"file:///home/b3ds/storage/dd","hdfs://192.168.38.68:8020/output"};
        run(args);
    }

    public static void run(String[] args) throws Exception {

        //Initialize the Hadoop job and set the jar as well as the name of the Job
        Configuration conf = new Configuration();
        Job job = new Job(conf, "WordCount");
        job.setJarByClass(WordCount.class);
//      job.getConfiguration().set("mapreduce.output.basename", "hi");
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(BytesWritable.class);
        job.setOutputKeyClass(BytesWritable.class);
        job.setOutputValueClass(BytesWritable.class);

        job.setMapperClass(Map.class);
        job.setCombinerClass(Reduce.class);
        job.setReducerClass(Reduce.class);
        job.setInputFormatClass(WholeFileInputFormat.class);
        job.setOutputFormatClass(SequenceFileOutputFormat.class);

        WholeFileInputFormat.setInputPaths(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.waitForCompletion(true);

    }

}

public static class Reduce extends Reducer<Text, BytesWritable, BytesWritable, NullWritable>{
    @Override
    protected void reduce(Text key, Iterable<BytesWritable> value,
                          Reducer<Text, BytesWritable, BytesWritable, NullWritable>.Context context)
            throws IOException, InterruptedException {
        NullWritable nullWritable = NullWritable.get();
        Iterator<BytesWritable> itr = value.iterator();
        while(itr.hasNext())
        {
            BytesWritable wr = itr.next();
            wr.setCapacity(wr.getLength());
            context.write(wr, nullWritable);
        }
    }
}