Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/unit-testing/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 与combineByKey相关的查询_Scala_Apache Spark - Fatal编程技术网

Scala 与combineByKey相关的查询

Scala 与combineByKey相关的查询,scala,apache-spark,Scala,Apache Spark,对于以下输入=>[('B',1),('B',2),('A',3),('A',4),('A',5)] 使用combineByKey处理后,我希望得到以下输出 预期产出=>[('A',[(3,9),(4,16),(5,25)],('B',[(1,1),(2,4)]] scala>valx=sc.parallelize(数组(('B',1),('B',2),('A',3),('A',4),('A',5))) x:org.apache.spark.rdd.rdd[(Char,Int)]=Parallel

对于以下输入=>
[('B',1),('B',2),('A',3),('A',4),('A',5)]
使用combineByKey处理后,我希望得到以下输出

预期产出=>
[('A',[(3,9),(4,16),(5,25)],('B',[(1,1),(2,4)]]

scala>valx=sc.parallelize(数组(('B',1),('B',2),('A',3),('A',4),('A',5)))
x:org.apache.spark.rdd.rdd[(Char,Int)]=ParallelCollectionRDD[46]位于parallelize at:24
scala>def createCombiner(元素:Int):字符串=(element.toString+”,“+Math.pow(元素,2.toInt)
createCombiner:(元素:Int)字符串
scala>def mergeValue(acumlator:String,element:Int):String=(acumlator+(element.toString+Math.pow(element,2.toInt))
合并值:(累加器:字符串,元素:Int)字符串
scala>def mergeComb(accumlator:String,accumlator1:String):String=(accumlator+accumlator1)
合并梳:(累加器:字符串,累加器1:字符串)字符串
scala>val combRDD=x.map(t=>(t._1,(t._2)).combineByKey(createCombiner,mergeValue,mergeComb)
comberdd:org.apache.spark.rdd.rdd[(Char,String)]=ShuffledRDD[48]位于combineByKey at:31
scala>combRDD.collect
res39:Array[(Char,String)]=数组((A,3,94165,25),(B,1,12,4))
我无法获得预期的输出。作为spark的新手,我需要一些关于这方面的信息。

关于:


scala>valx=sc.parallelize(数组(('B',1),('B',2),('A',3),('A',4),('A',5)))
scala>defcreatecombiner(元素:Int):List[(Int,Int)]=List(元素->元素*元素)
scala>def mergeValue(累加器:List[(Int,Int)],元素:Int:List[(Int,Int)]=累加器++创建组合器(元素)
scala>def mergeComb(累加器:列表[(Int,Int)],累加器1:List[(Int,Int)]):列表[(Int,Int)]=(累加器++累加器1)
scala>val combRDD=x.combineByKey(createCombiner、mergeValue、mergeComb)
scala>combRDD.collect
//res0:Array[(Char,List[(Int,Int)])]=Array((A,List((3,9)、(4,16)、(5,25)),(B,List((1,1)、(2,4)))
//或
scala>combRDD.mapValues(u.mkString(“[”,“,”,“])。收集
res1:Array[(Char,String)]=数组((A,[(3,9),(4,16),(5,25)],(B,[(1,1),(2,4)])
scala> val x = sc.parallelize(Array(('B',1),('B',2),('A',3),('A',4),('A',5)))
x: org.apache.spark.rdd.RDD[(Char, Int)] = ParallelCollectionRDD[46] at parallelize at <console>:24

scala> def createCombiner (element:Int) :String = (element.toString + "," + Math.pow(element,2).toInt)
createCombiner: (element: Int)String

scala> def mergeValue (accumlator:String, element:Int) : String = (accumlator + (element.toString + Math.pow(element,2).toInt))
mergeValue: (accumlator: String, element: Int)String

scala> def mergeComb (accumlator:String ,accumlator1:String):String = (accumlator + accumlator1)
mergeComb: (accumlator: String, accumlator1: String)String

scala> val combRDD = x.map(t => (t._1, (t._2))).combineByKey(createCombiner, mergeValue, mergeComb)
combRDD: org.apache.spark.rdd.RDD[(Char, String)] = ShuffledRDD[48] at combineByKey at <console>:31

scala> combRDD.collect
res39: Array[(Char, String)] = Array((A,3,94,165,25), (B,1,12,4))