Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 添加RDD[String,Array[String,Int]]的Int值_Scala_Apache Spark - Fatal编程技术网

Scala 添加RDD[String,Array[String,Int]]的Int值

Scala 添加RDD[String,Array[String,Int]]的Int值,scala,apache-spark,Scala,Apache Spark,我有一个RDD[String,Array[String,Int]] ["abc",[("asd",1),("asd",3),("cvd",2),("cvd",2),("xyz",1)]] 我想把它变成- ["abc",[("asd",4),("cvd",4),("xyz",1)]] 我试过- val y=hashedRdd.map(f=> (f._1,f._2.map(_._2).reduce((a,b)=>a+b))) 但是它返回RDD[Strin

我有一个RDD[String,Array[String,Int]]

    ["abc",[("asd",1),("asd",3),("cvd",2),("cvd",2),("xyz",1)]]
我想把它变成-

     ["abc",[("asd",4),("cvd",4),("xyz",1)]]
我试过-

     val y=hashedRdd.map(f=> (f._1,f._2.map(_._2).reduce((a,b)=>a+b)))
但是它返回RDD[String,Int]
我希望以RDD[String,Array[String,Int]]

的形式返回,一种方法是在
groupBy
之后的元组上减少
(第一个条目):


您可以对
数组进行分组
并计算值的

// Raw rdd
val hashedRdd = spark.sparkContext.parallelize(Seq(
  ("abc",Array(("asd",1),("asd",3),("cvd",2),("cvd",2),("xyz",1)))
))

//Group by first value and calculate the sum
val y = hashedRdd.map(x => {
  (x._1, x._2.groupBy(_._1).mapValues(_.map(_._2).sum))
})
输出:

y.foreach(println)
(abc,Map(xyz -> 1, asd -> 4, cvd -> 4))

希望这有帮助

你想说什么??(a,b)是元组(String,Int)?然后a没有。_1,并且此表达式不起作用。@Rasika是否删除了
.map(_。_2)
?是的,我使用了与您提到的相同的表达式。不起作用。@Rasika这里有一个更新,我已经用你的rdd测试过了。我没注意到你的原创作品没有分组。
y.foreach(println)
(abc,Map(xyz -> 1, asd -> 4, cvd -> 4))