Pyspark pypsark中有一个错误,它声明:TypeError:“Column”对象不可调用

Pyspark pypsark中有一个错误,它声明:TypeError:“Column”对象不可调用,pyspark,Pyspark,我所做的是尝试分组并收集列表: 数据: 代码: 代码对我来说似乎工作得很好,唯一的问题是您在collect_列表中使用了dayid列,其余看起来都很好 from pyspark.sql import SparkSession from pyspark.sql import functions as F sc = spark.sparkContext dataset1 = [{'id' : 12,'dates' : '2012-03-02','quantity' : 1}, {'id' :

我所做的是尝试分组并收集列表:

数据:

代码:


代码对我来说似乎工作得很好,唯一的问题是您在collect_列表中使用了dayid列,其余看起来都很好

from pyspark.sql import SparkSession
from pyspark.sql import functions as F
sc = spark.sparkContext

dataset1 = [{'id' : 12,'dates' : '2012-03-02','quantity' : 1},
  {'id' : 32,'dates' : '2012-02-21','quantity' : 4},
  {'id' : 12,'dates' : '2012-03-02','quantity' : 1},
  {'id' : 32,'dates' : '2012-02-21','quantity' : 4}]

rdd1 = sc.parallelize(dataset1)
df1 = spark.createDataFrame(rdd1)
df1.show()
+----------+---+--------+
|     dates| id|quantity|
+----------+---+--------+
|2012-03-02| 12|       1|
|2012-02-21| 32|       4|
|2012-03-02| 12|       1|
|2012-02-21| 32|       4|
+----------+---+--------+

new_df = df1.groupby('id').agg(F.collect_list("dayid"),F.collect_list("quantity"))
+---+----------------+----------------------+
| id|collect_list(id)|collect_list(quantity)|
+---+----------------+----------------------+
| 32|        [32, 32]|                [4, 4]|
| 12|        [12, 12]|                [1, 1]|
+---+----------------+----------------------+

请上传更多您的代码,也请更好地格式化您的数据,并在粘贴代码时使用代码块
new_df = df.groupby('id').agg(F.collect_list("dayid"),F.collect_list("quantity"))
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
sc = spark.sparkContext

dataset1 = [{'id' : 12,'dates' : '2012-03-02','quantity' : 1},
  {'id' : 32,'dates' : '2012-02-21','quantity' : 4},
  {'id' : 12,'dates' : '2012-03-02','quantity' : 1},
  {'id' : 32,'dates' : '2012-02-21','quantity' : 4}]

rdd1 = sc.parallelize(dataset1)
df1 = spark.createDataFrame(rdd1)
df1.show()
+----------+---+--------+
|     dates| id|quantity|
+----------+---+--------+
|2012-03-02| 12|       1|
|2012-02-21| 32|       4|
|2012-03-02| 12|       1|
|2012-02-21| 32|       4|
+----------+---+--------+

new_df = df1.groupby('id').agg(F.collect_list("dayid"),F.collect_list("quantity"))
+---+----------------+----------------------+
| id|collect_list(id)|collect_list(quantity)|
+---+----------------+----------------------+
| 32|        [32, 32]|                [4, 4]|
| 12|        [12, 12]|                [1, 1]|
+---+----------------+----------------------+