如何在Hadoop中序列化Java对象?
对象应该实现如何在Hadoop中序列化Java对象?,java,serialization,hadoop,Java,Serialization,Hadoop,对象应该实现可写接口,以便在Hadoop中传输时序列化。以LuceneScoreDoc类为例: public class ScoreDoc implements java.io.Serializable { /** The score of this document for the query. */ public float score; /** Expert: A hit document's number. * @see Searcher#doc(int) */
可写
接口,以便在Hadoop中传输时序列化。以LuceneScoreDoc
类为例:
public class ScoreDoc implements java.io.Serializable {
/** The score of this document for the query. */
public float score;
/** Expert: A hit document's number.
* @see Searcher#doc(int) */
public int doc;
/** Only set by {@link TopDocs#merge} */
public int shardIndex;
/** Constructs a ScoreDoc. */
public ScoreDoc(int doc, float score) {
this(doc, score, -1);
}
/** Constructs a ScoreDoc. */
public ScoreDoc(int doc, float score, int shardIndex) {
this.doc = doc;
this.score = score;
this.shardIndex = shardIndex;
}
// A convenience method for debugging.
@Override
public String toString() {
return "doc=" + doc + " score=" + score + " shardIndex=" + shardIndex;
}
}
我应该如何使用可写
接口序列化它?writeable
和java.io.serializable
接口之间有什么联系?首先请看您可以使用java序列化或
看你需要做你自己的写和读函数,它非常简单,因为里面可以调用API来读和写int,flaot,string等等
您的示例可写(需要导入)
注意:写入和读取的顺序应该相同,否则一个值将转到另一个,如果您有不同的类型,则在读取时会出现序列化错误我认为篡改内置Lucene类不是一个好主意。相反,您可以拥有自己的类,该类将包含ScoreDoc类型的字段,并将在接口中实现Hadoop可写。应该是这样的:
public class MyScoreDoc implements Writable {
private ScoreDoc sd;
public void write(DataOutput out) throws IOException {
String [] splits = sd.toString().split(" ");
// get the score value from the string
Float score = Float.parseFloat((splits[0].split("="))[1]);
// do the same for doc and shardIndex fields
// ....
out.writeInt(score);
out.writeInt(doc);
out.writeInt(shardIndex);
}
public void readFields(DataInput in) throws IOException {
float score = in.readInt();
int doc = in.readInt();
int shardIndex = in.readInt();
sd = new ScoreDoc (score, doc, shardIndex);
}
//String toString()
}
当hadoop在映射器和还原器之间传递值时,这两种方法的内部区别是什么?@Denzel:简而言之,主要区别是一种方法可以工作,而另一种方法不能(因为hadoop依赖于
可写的接口进行电汇):)它使用可写功能在网络上以最佳方式发送数据。。。为什么您需要直接在hadoop中传输ScoreDoc
的实例(而不是按照其中一个答案的建议进行包装)?你能提供更多关于你的用例的细节吗?
public class MyScoreDoc implements Writable {
private ScoreDoc sd;
public void write(DataOutput out) throws IOException {
String [] splits = sd.toString().split(" ");
// get the score value from the string
Float score = Float.parseFloat((splits[0].split("="))[1]);
// do the same for doc and shardIndex fields
// ....
out.writeInt(score);
out.writeInt(doc);
out.writeInt(shardIndex);
}
public void readFields(DataInput in) throws IOException {
float score = in.readInt();
int doc = in.readInt();
int shardIndex = in.readInt();
sd = new ScoreDoc (score, doc, shardIndex);
}
//String toString()
}