Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/387.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 在Spark中连接两个RDD,然后消除键_Java_Apache Spark_Rdd - Fatal编程技术网

Java 在Spark中连接两个RDD,然后消除键

Java 在Spark中连接两个RDD,然后消除键,java,apache-spark,rdd,Java,Apache Spark,Rdd,我有两个RDD,其结构如下: rdd1<String, String>: (str01, str12), (str01, str13), (str02, str13), .. rdd2<String, Float>: (str01, 0.1), (str02, 0.3), .. rdd1:(str01,str12),(str01,str13),(str02,str13)。。 rdd2:(str01,0.1),(str02,0.3)。。 我想加入这些RDD以获得

我有两个RDD,其结构如下:

rdd1<String, String>: (str01, str12), (str01, str13), (str02, str13), ..  
rdd2<String, Float>: (str01, 0.1), (str02, 0.3), ..  
rdd1:(str01,str12),(str01,str13),(str02,str13)。。
rdd2:(str01,0.1),(str02,0.3)。。
我想加入这些RDD以获得一个新的RDD,其中rdd1中的str01、str02被rdd2中的值替换,如下所示:

rdd3<String, Float>: (str12, 0.1), (str13, 0.1), (str13, 0.3)  
rdd4<String, Float>: (str12, 0.1), (str13, 0.1+0.3 = 0.4)  
rdd3:(str12,0.1),(str13,0.1),(str13,0.3)
然后,我需要按键减少此RDD,如下所示:

rdd3<String, Float>: (str12, 0.1), (str13, 0.1), (str13, 0.3)  
rdd4<String, Float>: (str12, 0.1), (str13, 0.1+0.3 = 0.4)  
rdd4:(str12,0.1),(str13,0.1+0.3=0.4)
我尝试了左外连接和右外连接,但以RDD结束 知道如何解决这个问题吗?

这有助于解决您的问题

val map1=List("str01" -> "str12", "str01" -> "str13", "str02" -> "str13")
val map2=List("str01"->0.1, "str02"->0.3)

val rdd1=sc.parallelize(map1)
val rdd2=sc.parallelize(map2)

val joinedrdd = rdd1.join(rdd2).map(x=> x._2)
val r = joinedrdd.reduceByKey(_+_)
这个rdd
r
的结构是:
rdd[(字符串,双精度)]