如何在Hadoop中序列化Java对象？_Java_Serialization_Hadoop

如何在Hadoop中序列化Java对象？

java serialization hadoop

如何在Hadoop中序列化Java对象？,java,serialization,hadoop,Java,Serialization,Hadoop,对象应该实现可写接口，以便在Hadoop中传输时序列化。以LuceneScoreDoc类为例： public class ScoreDoc implements java.io.Serializable { /** The score of this document for the query. */ public float score; /** Expert: A hit document's number. * @see Searcher#doc(int) */

对象应该实现

可写

接口，以便在Hadoop中传输时序列化。以Lucene

ScoreDoc

类为例：

public class ScoreDoc implements java.io.Serializable {

  /** The score of this document for the query. */
  public float score;

  /** Expert: A hit document's number.
   * @see Searcher#doc(int) */
  public int doc;

  /** Only set by {@link TopDocs#merge} */
  public int shardIndex;

  /** Constructs a ScoreDoc. */
  public ScoreDoc(int doc, float score) {
    this(doc, score, -1);
  }

  /** Constructs a ScoreDoc. */
  public ScoreDoc(int doc, float score, int shardIndex) {
    this.doc = doc;
    this.score = score;
    this.shardIndex = shardIndex;
  }

  // A convenience method for debugging.
  @Override
  public String toString() {
    return "doc=" + doc + " score=" + score + " shardIndex=" + shardIndex;
  }
}

我应该如何使用

可写

接口序列化它？

writeable

和

java.io.serializable

接口之间有什么联系？

首先请看您可以使用java序列化或

看你需要做你自己的写和读函数，它非常简单，因为里面可以调用API来读和写int，flaot，string等等

您的示例可写（需要导入）

注意：写入和读取的顺序应该相同，否则一个值将转到另一个，如果您有不同的类型，则在读取时会出现序列化错误

我认为篡改内置Lucene类不是一个好主意。相反，您可以拥有自己的类，该类将包含ScoreDoc类型的字段，并将在接口中实现Hadoop可写。应该是这样的：

public class MyScoreDoc implements Writable  {      

  private ScoreDoc sd;

  public void write(DataOutput out) throws IOException {
      String [] splits = sd.toString().split(" ");

      // get the score value from the string
      Float score = Float.parseFloat((splits[0].split("="))[1]);

      // do the same for doc and shardIndex fields
      // ....    

      out.writeInt(score);
      out.writeInt(doc);
      out.writeInt(shardIndex);
  }

  public void readFields(DataInput in) throws IOException {
      float score = in.readInt();
      int doc = in.readInt();
      int shardIndex = in.readInt();

      sd = new ScoreDoc (score, doc, shardIndex);
  }

  //String toString()
}

当hadoop在映射器和还原器之间传递值时，这两种方法的内部区别是什么？@Denzel：简而言之，主要区别是一种方法可以工作，而另一种方法不能（因为hadoop依赖于

可写的接口进行电汇）：）它使用可写功能在网络上以最佳方式发送数据。。。为什么您需要直接在hadoop中传输ScoreDoc的实例（而不是按照其中一个答案的建议进行包装）？你能提供更多关于你的用例的细节吗？
public class MyScoreDoc implements Writable  {      

  private ScoreDoc sd;

  public void write(DataOutput out) throws IOException {
      String [] splits = sd.toString().split(" ");

      // get the score value from the string
      Float score = Float.parseFloat((splits[0].split("="))[1]);

      // do the same for doc and shardIndex fields
      // ....    

      out.writeInt(score);
      out.writeInt(doc);
      out.writeInt(shardIndex);
  }

  public void readFields(DataInput in) throws IOException {
      float score = in.readInt();
      int doc = in.readInt();
      int shardIndex = in.readInt();

      sd = new ScoreDoc (score, doc, shardIndex);
  }

  //String toString()
}