带转置的pyspark列和

带转置的pyspark列和,pyspark,pyspark-sql,pyspark-dataframes,Pyspark,Pyspark Sql,Pyspark Dataframes,我有一个数据框看起来像- +---+---+---+---+ | id| w1| w2| w3| +---+---+---+---+ | 1|100|150|200| | 2|200|400|500| | 3|500|600|150| +---+---+---+---+ 我希望输出看起来像- full total_amt w1 800 w2 1150 w3 850 我的密码是- df = spark.createDataFrame(

我有一个数据框看起来像-

+---+---+---+---+
| id| w1| w2| w3|
+---+---+---+---+
|  1|100|150|200|
|  2|200|400|500|
|  3|500|600|150|
+---+---+---+---+
我希望输出看起来像-

full   total_amt
 w1       800
 w2       1150
 w3       850
我的密码是-

df = spark.createDataFrame(
    [(1, 100,150,200), (2, 200,400,500), (3, 500,600,150)], ("id", "w1","w2","w3"))

res = df.unionAll(
    df.select([
        F.lit('All').alias('id'), 
        F.sum(df.w1).alias('w1'),
        F.sum(df.w2).alias('w2'),
        F.sum(df.w3).alias('w3') 
    ]))
res.show()

But output gives me - 

+---+---+----+---+
| id| w1|  w2| w3|
+---+---+----+---+
|  1|100| 150|200|
|  2|200| 400|500|
|  3|500| 600|150|
|All|800|1150|850|
+---+---+----+---+

我认为添加后需要创建枢轴。所有字段本质上都是数字。

可以快速找到解决方案

>>> df.createOrReplaceTempView('df')

>>> spark.sql('''
...    select 'w1' as full, sum(w1) as total  from df 
...    union
...    select 'w2' as full, sum(w2) as total  from df 
...    union
...    select 'w3' as full, sum(w3) as total  from df 
... ''').show()
+----+-----+                                                                    
|full|total|
+----+-----+
|  w2| 1150|
|  w3|  850|
|  w1|  800|
+----+-----+
试试这个方法-

首先聚合数据,然后使用堆栈函数将列转换为行

import pyspark.sql.functions as psf

#perform aggregation
df_agg = df.agg(psf.sum('w1').alias('w1'), psf.sum('w2').alias('w2'), psf.sum('w3').alias('w3'))

#let's have a look at aggregated dataframe
df_agg.show()
#+---+----+---+
#| w1|  w2| w3|
#+---+----+---+
#|800|1150|850|
#+---+----+---+

#Use stack function to convert column to rows
df_agg.selectExpr("stack(3, 'w1', w1, 'w2', w2, 'w3', w3) as (full, total)").show()
#+----+-----+
#|full|total|
#+----+-----+
#|  w1|  800|
#|  w2| 1150|
#|  w3|  850|
#+----+-----+