Apache spark 使用PySpark中数据帧中的值除以聚合值_Apache Spark_Pyspark

Apache spark 使用PySpark中数据帧中的值除以聚合值

apache-spark pyspark

Apache spark 使用PySpark中数据帧中的值除以聚合值,apache-spark,pyspark,Apache Spark,Pyspark,我在pyspark中有一个如下所示的数据帧我想根据下面的列表填充一些列我做了如下的事情 import pyspark.sql.functions as F df.withColumn('cat', F.when(df.device.isin(phone_list), 'phones').otherwise( F.when(df.device.isin(pc_list), 'pc').otherwise( F.when(df.device.isin(security

我在pyspark中有一个如下所示的数据帧

我想根据下面的列表填充一些列

我做了如下的事情

import pyspark.sql.functions as F

df.withColumn('cat', 
    F.when(df.device.isin(phone_list), 'phones').otherwise(
    F.when(df.device.isin(pc_list), 'pc').otherwise(
    F.when(df.device.isin(security_list), 'security')))
).groupBy('id').pivot('cat').agg(F.count('cat')).show()

我得到了预期的结果

现在，我想对代码做一些修改，在我将cat列与该id的数据框中的值分开之后，我想填充列值

我尝试了下面的方法，但没有得到正确的结果

df.withColumn('cat', 
    F.when(df.device.isin(phone_list), 'phones').otherwise(
    F.when(df.device.isin(pc_list), 'pc').otherwise(
    F.when(df.device.isin(security_list), 'security')))
).groupBy('id').pivot('cat').agg(F.count('cat')/ df.val).show()

我怎样才能得到我想要的

编辑

预期结果

聚合需要一个聚合函数，一个简单的列不会被识别

由于val列为每组id列包含相同的值，所以可以将第一个内置函数用作

df.withColumn('cat',
              F.when(df.device.isin(phone_list), 'phones').otherwise(
                  F.when(df.device.isin(pc_list), 'pc').otherwise(
                      F.when(df.device.isin(security_list), 'security')))
              ).groupBy('id').pivot('cat').agg(F.count('cat')/ F.first(df.val)).show()

应该给你什么

+---+----+------------------+------------------+
| id|  pc|            phones|          security|
+---+----+------------------+------------------+
|  3| 1.0|              null|               2.0|
|  1| 0.5|               1.0|               0.5|
|  2|null|0.3333333333333333|0.3333333333333333|
+---+----+------------------+------------------+

你能举一个你想要的结果的例子吗？@pault我已经用期望的结果更新了这个问题

+---+----+------+--------+
| id|  pc|phones|security|
+---+----+------+--------+
|  1| 0.5|     1|     0.5|
|  3|   1|  null|       2|
|  2|null|  0.33|    0.33|
+---+----+------+--------+

df.withColumn('cat',
              F.when(df.device.isin(phone_list), 'phones').otherwise(
                  F.when(df.device.isin(pc_list), 'pc').otherwise(
                      F.when(df.device.isin(security_list), 'security')))
              ).groupBy('id').pivot('cat').agg(F.count('cat')/ F.first(df.val)).show()

+---+----+------------------+------------------+
| id|  pc|            phones|          security|
+---+----+------------------+------------------+
|  3| 1.0|              null|               2.0|
|  1| 0.5|               1.0|               0.5|
|  2|null|0.3333333333333333|0.3333333333333333|
+---+----+------------------+------------------+