Scala 与combineByKey相关的查询
对于以下输入=>Scala 与combineByKey相关的查询,scala,apache-spark,Scala,Apache Spark,对于以下输入=>[('B',1),('B',2),('A',3),('A',4),('A',5)] 使用combineByKey处理后,我希望得到以下输出 预期产出=>[('A',[(3,9),(4,16),(5,25)],('B',[(1,1),(2,4)]] scala>valx=sc.parallelize(数组(('B',1),('B',2),('A',3),('A',4),('A',5))) x:org.apache.spark.rdd.rdd[(Char,Int)]=Parallel
[('B',1),('B',2),('A',3),('A',4),('A',5)]
使用combineByKey处理后,我希望得到以下输出
预期产出=>[('A',[(3,9),(4,16),(5,25)],('B',[(1,1),(2,4)]]
scala>valx=sc.parallelize(数组(('B',1),('B',2),('A',3),('A',4),('A',5)))
x:org.apache.spark.rdd.rdd[(Char,Int)]=ParallelCollectionRDD[46]位于parallelize at:24
scala>def createCombiner(元素:Int):字符串=(element.toString+”,“+Math.pow(元素,2.toInt)
createCombiner:(元素:Int)字符串
scala>def mergeValue(acumlator:String,element:Int):String=(acumlator+(element.toString+Math.pow(element,2.toInt))
合并值:(累加器:字符串,元素:Int)字符串
scala>def mergeComb(accumlator:String,accumlator1:String):String=(accumlator+accumlator1)
合并梳:(累加器:字符串,累加器1:字符串)字符串
scala>val combRDD=x.map(t=>(t._1,(t._2)).combineByKey(createCombiner,mergeValue,mergeComb)
comberdd:org.apache.spark.rdd.rdd[(Char,String)]=ShuffledRDD[48]位于combineByKey at:31
scala>combRDD.collect
res39:Array[(Char,String)]=数组((A,3,94165,25),(B,1,12,4))
我无法获得预期的输出。作为spark的新手,我需要一些关于这方面的信息。关于:
scala>valx=sc.parallelize(数组(('B',1),('B',2),('A',3),('A',4),('A',5)))
scala>defcreatecombiner(元素:Int):List[(Int,Int)]=List(元素->元素*元素)
scala>def mergeValue(累加器:List[(Int,Int)],元素:Int:List[(Int,Int)]=累加器++创建组合器(元素)
scala>def mergeComb(累加器:列表[(Int,Int)],累加器1:List[(Int,Int)]):列表[(Int,Int)]=(累加器++累加器1)
scala>val combRDD=x.combineByKey(createCombiner、mergeValue、mergeComb)
scala>combRDD.collect
//res0:Array[(Char,List[(Int,Int)])]=Array((A,List((3,9)、(4,16)、(5,25)),(B,List((1,1)、(2,4)))
//或
scala>combRDD.mapValues(u.mkString(“[”,“,”,“])。收集
res1:Array[(Char,String)]=数组((A,[(3,9),(4,16),(5,25)],(B,[(1,1),(2,4)])
scala> val x = sc.parallelize(Array(('B',1),('B',2),('A',3),('A',4),('A',5)))
x: org.apache.spark.rdd.RDD[(Char, Int)] = ParallelCollectionRDD[46] at parallelize at <console>:24
scala> def createCombiner (element:Int) :String = (element.toString + "," + Math.pow(element,2).toInt)
createCombiner: (element: Int)String
scala> def mergeValue (accumlator:String, element:Int) : String = (accumlator + (element.toString + Math.pow(element,2).toInt))
mergeValue: (accumlator: String, element: Int)String
scala> def mergeComb (accumlator:String ,accumlator1:String):String = (accumlator + accumlator1)
mergeComb: (accumlator: String, accumlator1: String)String
scala> val combRDD = x.map(t => (t._1, (t._2))).combineByKey(createCombiner, mergeValue, mergeComb)
combRDD: org.apache.spark.rdd.RDD[(Char, String)] = ShuffledRDD[48] at combineByKey at <console>:31
scala> combRDD.collect
res39: Array[(Char, String)] = Array((A,3,94,165,25), (B,1,12,4))