Apache spark 当Tuple2';s键是mapToPair中的原始对象
我有一个用于流处理的JavaDStream sourceDStream 在这个数据流的mapToPair中,我使用输入对象作为Tuple2的键和值,如中所示 案例1:Apache spark 当Tuple2';s键是mapToPair中的原始对象,apache-spark,mapreduce,spark-streaming,Apache Spark,Mapreduce,Spark Streaming,我有一个用于流处理的JavaDStream sourceDStream 在这个数据流的mapToPair中,我使用输入对象作为Tuple2的键和值,如中所示 案例1: public Tuple2<SourceObject, SourceObject> call(SourceObject sourceObject) Tuple2<WidgetDetail, WidgetDetail> tuple2; tuple2 = new Tuple2<> (
public Tuple2<SourceObject, SourceObject> call(SourceObject sourceObject)
Tuple2<WidgetDetail, WidgetDetail> tuple2;
tuple2 = new Tuple2<> (sourceObject, sourceObject);
return tuple2;
}
然而,火花
public class SourceKey {
private SourceObject sourceObject;
public void setSourceObject (SourceObject sourceObject) {
this.sourceObject = sourceObject;
}
public boolean equals (Object obj) {
...
}
}
public Tuple2<SourceKey, SourceKey> call(SourceObject sourceObject)
Tuple2<WidgetDetail, WidgetDetail> tuple2;
SourceKey sourceKey = new SourceKey ();
sourceKey.setSourceObject(sourceObject);
tuple2 = new Tuple2<> (sourceKey, sourceKey);
return tuple2;
}
公共类源密钥{
私有源对象源对象;
public void setSourceObject(SourceObject SourceObject){
this.sourceObject=sourceObject;
}
公共布尔等于(对象obj){
...
}
}
公共元组2调用(SourceObject SourceObject)
Tuple2-Tuple2;
SourceKey SourceKey=newsourcekey();
sourceKey.setSourceObject(sourceObject);
tuple2=新的tuple2(sourceKey,sourceKey);
返回tuple2;
}
然后Spark按预期工作,为sourceDStream中的所有对象调用SourceKey的equals。因此,对具有相同键的所有对象调用reduceByKey
对于案例1,当SourceObject也用作mapToPair的Tuple2中的键/值时,为什么Spark会跳过调用SourceObject的equals
如何解决这个问题,并让Spark为sourceDStream中的所有对象调用SourceObject的equals,从而减少具有相同键的对象
谢谢
迈克尔
public class SourceKey {
private SourceObject sourceObject;
public void setSourceObject (SourceObject sourceObject) {
this.sourceObject = sourceObject;
}
public boolean equals (Object obj) {
...
}
}
public Tuple2<SourceKey, SourceKey> call(SourceObject sourceObject)
Tuple2<WidgetDetail, WidgetDetail> tuple2;
SourceKey sourceKey = new SourceKey ();
sourceKey.setSourceObject(sourceObject);
tuple2 = new Tuple2<> (sourceKey, sourceKey);
return tuple2;
}