Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala Spark Dataframe-每行数组内容的总和[Double]_Scala_Apache Spark_Apache Spark Sql_Spark Dataframe - Fatal编程技术网

Scala Spark Dataframe-每行数组内容的总和[Double]

Scala Spark Dataframe-每行数组内容的总和[Double],scala,apache-spark,apache-spark-sql,spark-dataframe,Scala,Apache Spark,Apache Spark Sql,Spark Dataframe,这是我的基本数据框架: root |-- user_id: string (nullable = true) |-- review_id: string (nullable = true) |-- review_influence: double (nullable = false) 目标是对每个用户id具有审核影响的总和。因此,我试图聚合数据并将其汇总如下: val review_influence_listDF = review_with_influenceDF .

这是我的基本数据框架:

root |-- user_id: string (nullable = true) 
     |-- review_id: string (nullable = true) 
     |-- review_influence: double (nullable = false)
目标是对每个用户id具有审核影响的总和。因此,我试图聚合数据并将其汇总如下:

val review_influence_listDF = review_with_influenceDF
.groupBy("user_id")
.agg(collect_list("review_id") as("list_review_id"), collect_list("review_influence") as ("list_review_influence"))
.agg(sum($"list_review_influence"))
但我有一个错误:

org.apache.spark.sql.AnalysisException: cannot resolve 'sum(`list_review_influence`)' due to data type mismatch: function sum requires numeric types, not ArrayType(DoubleType,true);;

我能做些什么?

您可以直接在
agg
函数中对列求和:

review_with_influenceDF
    .groupBy("user_id")
    .agg(collect_list($"review_id").as("list_review_id"), 
         sum($"review_influence").as("sum_review_influence"))