Python Spark为每一列创建一个包含总和的行（就像每一列的总和）_Python_Scala_Apache Spark_Apache Spark Sql

Python Spark为每一列创建一个包含总和的行（就像每一列的总和）

python scala apache-spark

Python Spark为每一列创建一个包含总和的行（就像每一列的总和）,python,scala,apache-spark,apache-spark-sql,Python,Scala,Apache Spark,Apache Spark Sql,我有一个像这样的数据框 +-----------+-----------+-----------+ |salesperson| device|amount_sold| +-----------+-----------+-----------+ | john| notebook| 2| | gary| notebook| 3| | john|small_phone| 2| | ma

我有一个像这样的数据框

+-----------+-----------+-----------+
|salesperson|     device|amount_sold|
+-----------+-----------+-----------+
|       john|   notebook|          2|
|       gary|   notebook|          3|
|       john|small_phone|          2|
|       mary|small_phone|          3|
|       john|large_phone|          3|
|       john|     camera|          3|
+-----------+-----------+-----------+

我使用

pivot

函数将其转换为一个

Total

列

+-----------+------+-----------+--------+-----------+-----+
|salesperson|camera|large_phone|notebook|small_phone|Total|
+-----------+------+-----------+--------+-----------+-----+
|       gary|     0|          0|       3|          0|    3|
|       mary|     0|          0|       0|          3|    3|
|       john|     3|          3|       2|          2|   10|
+-----------+------+-----------+--------+-----------+-----+

但我想要一个数据框，其中包含一行（总计），该行还包含每列的总计，如下所示：

+-----------+------+-----------+--------+-----------+-----+
|salesperson|camera|large_phone|notebook|small_phone|Total|
+-----------+------+-----------+--------+-----------+-----+
|       gary|     0|          0|       3|          0|    3|
|       mary|     0|          0|       0|          3|    3|
|       john|     3|          3|       2|          2|   10|
|      Total|     3|          3|       5|          5|   16|
+-----------+------+-----------+--------+-----------+-----+

val columns = df.columns.dropWhile(_ == "salesperson").map(col)

//Use function `sum` on each column and union the result with original DataFrame.
val withTotalAsRow = df.union(df.select(lit("Total").as("salesperson") +: columns.map(sum):_*)) 

//I think this column already exists in DataFrame
//Append another column by adding value from each column
val withTotalAsColumn = withTotalAsRow.withColumn("Total", columns.reduce(_ plus _))

使用Scala/Python可以做到这一点吗？（最好是Scala和使用Spark），如果可能，不要使用接头

TIA

使用spark Scala，您可以使用以下代码片段实现这一点

//假设spark会话作为名为“spark”的变量可用
导入spark.implicits_
val resultDF=df.withColumn（“总计”，总和（$“照相机”，“大型电话”，“笔记本”，“小型电话”））

使用spark Scala，您可以使用以下代码片段实现这一点

//假设spark会话作为名为“spark”的变量可用
导入spark.implicits_
val resultDF=df.withColumn（“总计”，总和（$“照相机”，“大型电话”，“笔记本”，“小型电话”））

您可以执行以下操作：

+-----------+------+-----------+--------+-----------+-----+
|salesperson|camera|large_phone|notebook|small_phone|Total|
+-----------+------+-----------+--------+-----------+-----+
|       gary|     0|          0|       3|          0|    3|
|       mary|     0|          0|       0|          3|    3|
|       john|     3|          3|       2|          2|   10|
|      Total|     3|          3|       5|          5|   16|
+-----------+------+-----------+--------+-----------+-----+

val columns = df.columns.dropWhile(_ == "salesperson").map(col)

//Use function `sum` on each column and union the result with original DataFrame.
val withTotalAsRow = df.union(df.select(lit("Total").as("salesperson") +: columns.map(sum):_*)) 

//I think this column already exists in DataFrame
//Append another column by adding value from each column
val withTotalAsColumn = withTotalAsRow.withColumn("Total", columns.reduce(_ plus _))

您可以执行以下操作：

+-----------+------+-----------+--------+-----------+-----+
|salesperson|camera|large_phone|notebook|small_phone|Total|
+-----------+------+-----------+--------+-----------+-----+
|       gary|     0|          0|       3|          0|    3|
|       mary|     0|          0|       0|          3|    3|
|       john|     3|          3|       2|          2|   10|
|      Total|     3|          3|       5|          5|   16|
+-----------+------+-----------+--------+-----------+-----+

val columns = df.columns.dropWhile(_ == "salesperson").map(col)

//Use function `sum` on each column and union the result with original DataFrame.
val withTotalAsRow = df.union(df.select(lit("Total").as("salesperson") +: columns.map(sum):_*)) 

//I think this column already exists in DataFrame
//Append another column by adding value from each column
val withTotalAsColumn = withTotalAsRow.withColumn("Total", columns.reduce(_ plus _))

您是否尝试过使用

sum

函数？我猜您已经有了第二个DataFrame。请检查最后两个表，似乎存在复制/粘贴问题。另外，请显示您的代码，说明您如何尝试解决此问题，或尝试解决此问题的任何努力。很抱歉，第二个表中存在复制/粘贴问题，我现在已更正。第三个表包含一个

Total

行，这就是我想要的。我想知道是否有一种方法可以用每列的和来添加一行（而不是使用union，我认为这应该是可能的），您是否尝试过使用

sum

Total

行，这就是我想要的。我想知道是否有一种方法可以用每列的和来添加一行（而不是使用union，我认为这应该是可能的）谢谢你这个答案是有效的（我不需要使用第三个语句），但我正在寻找一些不使用union的东西，如果可能的话。我想知道在Spark中，我们是否可以不使用union添加一行（我知道可以添加列），我认为不可能生成一个包含所有现有行和一个新添加行（这是所有现有行的聚合）的新DataFrame。谢谢你，这个答案有效（我不需要使用第三条语句）但我一直在寻找一些不使用联合的东西，如果可能的话。我想知道在Spark中，我们是否可以添加一行而不使用union（我知道可以添加列），我认为不可能生成一个包含所有现有行和一个新添加行的新DataFrame（这是所有现有行的聚合）。