Java RDD联接:联接两个不同的RDD对后,结果RDD键值和顺序是否已更改?

Java RDD联接:联接两个不同的RDD对后,结果RDD键值和顺序是否已更改?,java,join,apache-spark,rdd,Java,Join,Apache Spark,Rdd,我有两对RDD RDD1 : [(1,a),(2,b),(3,c)] RDD2 : [(1,d),(2,e),(3,f)] 现在我使用join加入这些rdd RDD3 = RDD1.join(RDD2); 我已经用下面的代码显示了RDD3中的元素 for(Tuple2<Integer,Tuple2<String,String>> tuple : RDD3.collect()) System.out.prin

我有两对RDD

 RDD1 : [(1,a),(2,b),(3,c)]    
 RDD2 : [(1,d),(2,e),(3,f)]
现在我使用join加入这些rdd

 RDD3 = RDD1.join(RDD2);
我已经用下面的代码显示了RDD3中的元素

 for(Tuple2<Integer,Tuple2<String,String>> tuple : RDD3.collect()) 
                      System.out.println(tuple._1()+":"+tuple._2()._1()+","+tuple._2()._2());
我想去哪里

1:a,d
1:b,e 
1:c,f
是否有任何方法可以获得上述所需的输出? 或者我错误地解释了RDD行为?请建议

编辑:

事实上,我正在阅读这样的数据

JavaDoubleRDD data1 = sc.parallelizeDoubles(Arrays.asList(45.25,22.15,33.24));
JavaDoubleRDD data2 = sc.parallelizeDoubles(Arrays.asList(23.45,19.35,12.45));
然后

JavaPairRDD<Double,Double> lat1 = data1.cartesian(data1);
JavaRDD<Double> lat2 = lat1.map(new Function<Tuple2<Double,Double>,Double>() {
    @Override
    public Double call(Tuple2<Double,Double> t) {
        return Math.pow(t._1()-t._2(),2);
    }
});
 //flag and flag1 are static variables initially equal to 1
JavaPairRDD<Integer,Double> lat3 = lat2.mapToPair(new PairFunction<Double,Integer,Double>() {
    @Override
     public Tuple2<Integer,Double> call(Double d) {
        return new Tuple2<Integer,Double>(flag++,d);
    }
});
System.out.println("Latitude values display");  
    for(Tuple2<?,?> tuple : lat3.collect()) {
                  System.out.println(tuple._1()+":"+tuple._2());
    } 
JavaPairRDD<Double,Double> long1 = data2.cartesian(data2);
JavaRDD<Double> long2 = long1.map(new Function<Tuple2<Double,Double>,Double>() {
        @Override
        public Double call(Tuple2<Double,Double> t) {
                return Math.pow(t._1()-t._2(),2);
    }
});
    JavaPairRDD<Integer,Double> long3 = long2.mapToPair(new PairFunction<Double,Integer,Double>() {
        @Override
        public Tuple2<Integer,Double> call(Double d ) {
                return new Tuple2<Integer,Double>(flag1++,d);
        }
});
System.out.println("Longitude values display"); 
    for(Tuple2<?,?> tuple : long3.collect()) {
                  System.out.println(tuple._1()+":"+tuple._2());
    }
System.out.println("latitude and longitude values join");
JavaPairRDD<Integer,Tuple2<Double,Double>> weightmatrix1 = lat3.join(long3);
System.out.println("Weightmatrix1 Display");
    for(Tuple2<?,Tuple2<?,?>> tuple : weightmatrix1.collect()) {
                  System.out.println(tuple._1()+":"+tuple._2()._1()+","+tuple._2()._2());
    }   
javapairdd-lat1=data1.cartesian(data1);
JavaRDD lat2=lat1.map(新函数(){
@凌驾
公用双呼(Tuple2 t){
返回Math.pow(t._1()-t._2(),2);
}
});
//flag和flag1是静态变量,最初等于1
javapairdd lat3=lat2.mapToPair(新的PairFunction(){
@凌驾
公共元组2调用(双d){
返回新的Tuple2(flag++,d);
}
});
System.out.println(“纬度值显示”);
for(Tuple2 tuple:lat3.collect()){
System.out.println(tuple._1()+“:“+tuple._2());
} 
javapairdd long1=data2.cartesian(data2);
JavaRDD long2=long1.map(新函数(){
@凌驾
公用双呼(Tuple2 t){
返回Math.pow(t._1()-t._2(),2);
}
});
javapairrdlong3=long2.mapToPair(新的PairFunction(){
@凌驾
公共元组2调用(双d){
返回新的Tuple2(flag1++,d);
}
});
System.out.println(“经度值显示”);
for(Tuple2 tuple:long3.collect()){
System.out.println(tuple._1()+“:“+tuple._2());
}
System.out.println(“纬度和经度值连接”);
javapairdd weightmarix1=lat3.join(long3);
System.out.println(“权重矩阵1显示”);
对于(Tuple2>tuple:weightmarix1.collect()){
System.out.println(tuple.\u 1()+“:“+tuple.\u 2().\u 1()+”,“+tuple.\u 2().\u 2());
}   
因此,我所做的是根据纬度和经度值计算权重矩阵,当我这样做时:

scala> val rdd1 = sc.parallelize(Array((1,"a"),(2,"b"),(3,"c")))
scala> val rdd2 = sc.parallelize(Array((1,"d"),(2,"e"),(3,"f")))
scala> val rdd3 = rdd1.join(rdd2)
scala> rdd3.toArray.foreach(println(_))
我始终得到:

(1,(a,d))
(2,(b,e))
(3,(c,f))
当我这样做时:

scala> val rdd1 = sc.parallelize(Array((1,"a"),(2,"b"),(3,"c")))
scala> val rdd2 = sc.parallelize(Array((1,"d"),(2,"e"),(3,"f")))
scala> val rdd3 = rdd1.join(rdd2)
scala> rdd3.toArray.foreach(println(_))
我始终得到:

(1,(a,d))
(2,(b,e))
(3,(c,f))

这就是我尝试的预期结果:

val data1 = sc.parallelize(Array((1,"a"),(2,"b"),(3,"c")))
val data2 = sc.parallelize(Array((1,"d"),(2,"e"),(3,"f")))
val data3 = data1.join(data2)
data3.collect().map(tuple => tuple._1 + ":"+tuple._2._1+","+tuple._2._2).foreach(println(_))
获得:

1:a,d
2:b,e
3:c,f

这就是scala。我想在Java中应该是相同的输出

这就是我尝试的结果:

val data1 = sc.parallelize(Array((1,"a"),(2,"b"),(3,"c")))
val data2 = sc.parallelize(Array((1,"d"),(2,"e"),(3,"f")))
val data3 = data1.join(data2)
data3.collect().map(tuple => tuple._1 + ":"+tuple._2._1+","+tuple._2._2).foreach(println(_))
获得:

1:a,d
2:b,e
3:c,f

这就是scala。我想在Java中应该是相同的输出

没有足够的信息。我怀疑您没有显示的代码有问题。@Sean Owen:我已经添加了我的代码。我需要构造一个类似于基于纬度、经度值的距离数据的矩阵。我认为使用全局静态变量的问题非常明显。您的问题陈述与代码不同。信息不足。我怀疑您没有显示的代码有问题。@Sean Owen:我已经添加了我的代码。我需要构造一个类似于基于纬度、经度值的距离数据的矩阵。我认为使用全局静态变量的问题非常明显。你的问题陈述不是你的代码所做的。