Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 使用聚合函数对列中的数组求和_Apache Spark_Apache Spark Sql - Fatal编程技术网

Apache spark 使用聚合函数对列中的数组求和

Apache spark 使用聚合函数对列中的数组求和,apache-spark,apache-spark-sql,Apache Spark,Apache Spark Sql,我试图对包含数组的字段求和 a = sc.parallelize([("a", [1,1,1]), ("a", [2,2])]) a = a.toDF(["g", "arr_val"]) a.registerTempTable('a') sql = """ select aggregate(arr_val, 0, (acc, x) -> acc + x) as sum from a """ spark.sql(sql).show() 但我

我试图对包含数组的字段求和

a = sc.parallelize([("a", [1,1,1]),
                    ("a", [2,2])])

a = a.toDF(["g", "arr_val"])

a.registerTempTable('a')

sql = """
select 
aggregate(arr_val, 0, (acc, x) -> acc + x) as sum
from a
"""

spark.sql(sql).show()
但我遇到了以下错误:

An error occurred while calling o24.sql.
: org.apache.spark.sql.AnalysisException: cannot resolve 'aggregate(a.`arr_val`, 0, lambdafunction((CAST(namedlambdavariable() AS BIGINT) + namedlambdavariable()), namedlambdavariable(), namedlambdavariable()), lambdafunction(namedlambdavariable(), namedlambdavariable()))' due to data type mismatch: argument 3 requires int type, however, 'lambdafunction((CAST(namedlambdavariable() AS BIGINT) + namedlambdavariable()), namedlambdavariable(), namedlambdavariable())' is of bigint type.; line 3 pos 0;

如何使其工作?

您需要将累加器中的值转换为浮点数:

a = sc.parallelize([("a", [1,1,1]),
                    ("a", [2,2])])
a = a.toDF(["g", "arr_val"])

a.registerTempTable('a')

sql = """
select 
aggregate(arr_val, cast(0 as float), (acc, x) -> acc + cast(x as float)) as sum
from a
"""

spark.sql(sql).show()