Class mapreduce中的可写类_Class_Hadoop_Mapreduce_Key Value_Writable

Class mapreduce中的可写类

class hadoop mapreduce

Class mapreduce中的可写类,class,hadoop,mapreduce,key-value,writable,Class,Hadoop,Mapreduce,Key Value,Writable,如何使用hashset（docid和offset）到reduce writable的值来连接map writable和reduce writable？映射器（LineIndexMapper）工作正常，但在reducer（LineIndexReducer）中，我得到一个错误，当我键入以下内容时，它无法获取字符串作为参数： write（键，新的IndexRecordWritable（“某些字符串”）；虽然我在ReduceWritable中也有公共字符串toString（）。我相信reducer

如何使用hashset（docid和offset）到reduce writable的值来连接map writable和reduce writable？映射器（LineIndexMapper）工作正常，但在reducer（LineIndexReducer）中，我得到一个错误，当我键入以下内容时，它无法获取字符串作为参数： write（键，新的IndexRecordWritable（“某些字符串”）；虽然我在ReduceWritable中也有公共字符串toString（）。
我相信reducer的writeable（indexrecordwriteable.java）中的hashset可能没有正确地获取值？我有下面的代码

IndexMapRecordWritable.java
    
    

    
        import java.io.DataInput;
        import java.io.DataOutput;
        import java.io.IOException;
        import org.apache.hadoop.io.LongWritable;
        import org.apache.hadoop.io.Text;
        import org.apache.hadoop.io.Writable;
    
        public class IndexMapRecordWritable implements Writable {
    
            private LongWritable offset;
            private Text docid;
    
            public LongWritable getOffsetWritable() {
                return offset;
            }
    
            public Text getDocidWritable() {
                return docid;
            }
    
            public long getOffset() {
                return offset.get();
            }
    
            public String getDocid() {
                return docid.toString();
            }
    
            public IndexMapRecordWritable() {
                this.offset = new LongWritable();
                this.docid = new Text();
            }
          
            public IndexMapRecordWritable(long offset, String docid) {
                this.offset = new LongWritable(offset);
                this.docid = new Text(docid);
            }
            public IndexMapRecordWritable(IndexMapRecordWritable indexMapRecordWritable) {
                this.offset = indexMapRecordWritable.getOffsetWritable();
                this.docid = indexMapRecordWritable.getDocidWritable();
            }
            @Override
            public String toString() {
    
                StringBuilder output = new StringBuilder()
                output.append(docid);
                output.append(offset);
                
                return output.toString();
    
            }
    
            @Override
            public void write(DataOutput out) throws IOException {
 

            }
    
            @Override
            public void readFields(DataInput in) throws IOException {


            }
    
        }
    
    
    
好的，这是我基于一些假设的答案。最终的输出是一个文本文件，包含键和文件名，根据reducer类对前置条件和后置条件的注释中的信息用逗号分隔
在这种情况下，您实际上不需要IndexRecordWritable类
context.write(key, new Text(valueBuilder.substring(0, valueBuilder.length() - 1))); 

类声明行为
public class LineIndexReducer extends Reducer<Text, IndexMapRecordWritable, Text, Text>

公共类LineIndexReducer扩展了Reducer

不要忘记在驱动程序中设置正确的输出类
根据reducer类中的post条件，这必须起到作用。但是，如果您真的想在上下文中编写一个文本IndexRecordWritable对，有两种方法-
以字符串作为参数（基于您在IndexRecordWritable类构造函数未设计为接受字符串时传递字符串的尝试），以及
以HashSet作为参数（基于IndexRecordWritable类中初始化的HashSet）
由于您的IndexRecordWritable类的构造函数未设计为接受字符串作为输入，因此您无法传递字符串。因此，您得到的错误是您不能将字符串用作参数。Ps：如果您希望构造函数接受字符串，您的IndexRecordWritable类中必须有另一个构造函数，如下所示：
// Save each index record from maps
    private HashSet<IndexMapRecordWritable> tokens = new HashSet<IndexMapRecordWritable>();
    
    // to save the string
    private String value;

    public IndexRecordWritable() {
    }

    public IndexRecordWritable(
            HashSet<IndexMapRecordWritable> indexMapRecordWritables) {
        /***/
    }

    // to accpet string
    public IndexRecordWritable (String value)   {
        this.value = value;
    }

//保存地图中的每个索引记录
私有HashSet令牌=新HashSet（）；
//保存字符串
私有字符串值；
公共索引可写（）{
}
公共索引可写(
HashSet indexMapRecordWritables）{
/***/
}
//收线
公共索引可写（字符串值）{
这个值=值；
}

但是，如果您想使用哈希集，则该方法无效。因此，不能使用方法#1。您不能传递字符串
这就给我们留下了方法#2。将哈希集作为参数传递，因为您希望使用哈希集。在这种情况下，您必须在减速器中创建一个哈希集，然后再将其作为参数传递给IndexRecordWritable In context.write
要执行此操作，减速器必须如下所示
@Override
    protected void reduce(Text key, Iterable<IndexMapRecordWritable> values, Context context) throws IOException, InterruptedException {
        //StringBuilder valueBuilder = new StringBuilder();

        HashSet<IndexMapRecordWritable> set = new HashSet<>();

        for (IndexMapRecordWritable val : values) {
            set.add(val);
            //valueBuilder.append(val);
            //valueBuilder.append(",");
        }

        //write the key and the adjusted value (removing the last comma)
        //context.write(key, new IndexRecordWritable(valueBuilder.substring(0, valueBuilder.length() - 1)));
        context.write(key, new IndexRecordWritable(set));
        //valueBuilder.setLength(0);
    }

@覆盖
受保护的void reduce（文本键、Iterable值、上下文上下文）引发IOException、InterruptedException{
//StringBuilder valueBuilder=新的StringBuilder（）；
HashSet=newhashset（）；
for（indexmaprecordwriteable val:values）{
集合。添加（val）；
//valueBuilder.append（val）；
//valueBuilder.追加（“，”）；
}
//写入键和调整后的值（删除最后一个逗号）
//write（key，新的IndexRecordWritable（valueBuilder.substring（0，valueBuilder.length（）-1））；
write（key，newindexrecordwriteable（set））；
//valueBuilder.setLength（0）；
}

您的IndexRecordWritable.java必须具有此功能
// Save each index record from maps
    private HashSet<IndexMapRecordWritable> tokens = new HashSet<IndexMapRecordWritable>();

// to save the string
//private String value;

public IndexRecordWritable() {
}

public IndexRecordWritable(
        HashSet<IndexMapRecordWritable> indexMapRecordWritables) {
    /***/
    tokens.addAll(indexMapRecordWritables);
}

//保存地图中的每个索引记录
私有HashSet令牌=新HashSet（）；
//保存字符串
//私有字符串值；
公共索引可写（）{
}
公共索引可写(
HashSet indexMapRecordWritables）{
/***/
tokens.addAll（indexmaprecordwriteables）；
}

记住，根据减速器的说明，这不是要求
POST-CONDITION: emit the output a single key-value where all the file names are separated by a comma ",".  <"marcello", "a.txt@3345,b.txt@344,c.txt@785">

POST-CONDITION：输出一个键值，其中所有文件名用逗号“，”分隔。

如果仍选择发射（文本，IndexRecordWritable），请记住在IndexRecordWritable中处理哈希集，以获得所需格式。
您在哪里有上下文。在代码中写入？请将错误消息发布为截图。从外观上看，似乎您已使用job.setOutputKeyClass（Text.class）将输出类设置为驱动程序类中的文本那么，在你的reducer类中，类型基本上是“extend reducer”啊，对了。你也可以发布你的映射器和reducer吗？看起来问题确实出在IndexRecordWritable中。你能告诉我你的reducer的值的输出是什么吗？如果可以的话，给我一个例子。。检查这个。。在你的上下文中。写吧，你是以具有字符串的IndexRecordWritable的对象。但是，你的IndexRecordWritable构造函数不接受字符串，而是需要一个iterable对象。你能告诉我IndexRecordWritable的构造函数中发生了什么吗？这个-public IndexRecordWritable（iterable indexMapRecordWritables）{/***/}在/***/*中发生了什么？