Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala Spark-aggregateByKey类型不匹配错误_Scala_Apache Spark_Aggregate_Aggregate Functions - Fatal编程技术网

Scala Spark-aggregateByKey类型不匹配错误

Scala Spark-aggregateByKey类型不匹配错误,scala,apache-spark,aggregate,aggregate-functions,Scala,Apache Spark,Aggregate,Aggregate Functions,我正在努力找出这背后的问题。我正在尝试使用aggregateByKey查找每个学生的最大分数 val data = spark.sc.Seq(("R1","M",22),("R1","E",25),("R1","F",29), ("R2","M",20),("R2","E",32),("R2","F",52)) .toDF("Name","Subject","Marks") def seqOp = (acc:I

我正在努力找出这背后的问题。我正在尝试使用
aggregateByKey
查找每个学生的最大分数

val data = spark.sc.Seq(("R1","M",22),("R1","E",25),("R1","F",29),
                        ("R2","M",20),("R2","E",32),("R2","F",52))
                   .toDF("Name","Subject","Marks")
def seqOp = (acc:Int,ele:(String,Int)) => if (acc>ele._2) acc else ele._2
def combOp =(acc:Int,acc1:Int) => if(acc>acc1) acc else acc1

val r = data.rdd.map{case(t1,t2,t3)=> (t1,(t2,t3))}.aggregateByKey(0)(seqOp,combOp)

我得到的错误是,
aggregateByKey
接受
(Int,(Any,Any))
但实际值是
(Int,(String,Int))

您的map函数不正确,因为您有一个
行作为输入,而不是
元组3

用以下命令修复最后一行:

val r = data.rdd.map { r =>
      val t1 = r.getAs[String](0)
      val t2 = r.getAs[String](1)
      val t3 = r.getAs[Int](2)
      (t1,(t2,t3))
    }.aggregateByKey(0)(seqOp,combOp)

我通过
rdd.map{case(name,u,marks)=>(name,marks)}.groupByKey().map(x=>(x.1,x.2.max))
来解决它。结果:
列表((R2,52)、(R1,29))
。我找不到使用aggregateByKey的方法