Python 无法在Apache Spark中使用Pivot
我正在尝试在ApacheSpark中使用pivot 我的数据是:Python 无法在Apache Spark中使用Pivot,python,apache-spark,pyspark,apache-spark-sql,pivot,Python,Apache Spark,Pyspark,Apache Spark Sql,Pivot,我正在尝试在ApacheSpark中使用pivot 我的数据是: +--------------------+---------+ | timestamp| user| +--------------------+---------+ |2017-12-19T00:41:...|User_1| |2017-12-19T00:01:...|User_2| |2017-12-19T00:01:...|User_1| |2017-12-19T00:01:...|User_1| |
+--------------------+---------+
| timestamp| user|
+--------------------+---------+
|2017-12-19T00:41:...|User_1|
|2017-12-19T00:01:...|User_2|
|2017-12-19T00:01:...|User_1|
|2017-12-19T00:01:...|User_1|
|2017-12-19T00:01:...|User_2|
+--------------------+---------+
我想以用户列为中心
但我一直在犯错误:
'DataFrame' object has no attribute 'pivot'
Traceback (most recent call last):
File "/usr/hdp/current/spark2-client/python/pyspark/sql/dataframe.py", line 1020, in __getattr__
"'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
AttributeError: 'DataFrame' object has no attribute 'pivot'
不管我怎么用它
i、 e.df.groupBy'A',pivot'B'或df.pivot'B'
我的实际问题是:
# The Pivot operation will give timestamp vs Users data
pivot_pf = tf.groupBy(window(tf["timestamp"], "2 minutes"), 'user').count().select('window.start', 'user', 'count').pivot("user").sum("count")
非常感谢您的帮助
谢谢。Pivot工作正常,如下所述。但它返回分组数据。如果我们对分组数据使用某种聚合,它将导致数据帧
val d1 = Array(("a", "10"), ("b", "20"), ("c", "30"),("a","56"),("c","29"))
val rdd1= sc.parallelize(d1)
val df1 = rdd1.toDF("key","val")
df1.groupBy("key").pivot("val")
预期产出:
+--------------------+----+-----+----+
| window| one|three| two|
+--------------------+----+-----+----+
|[2012-01-01 00:00...| 1| null| 1|
|[2012-01-01 00:04...|null| null| 1|
|[2012-01-01 00:02...| 1| 1|null|
+--------------------+----+-----+----+
df.groupBy'A'。pivot'B'后面应该有一些聚合。@ShankarKoirala谢谢,我已经用真正的查询更新了这个问题。.我实际上在后面使用sumpivot@ShankarKoirala数据格式的另一个更新..请看一看如果按此字段透视,请不要按用户分组。@Rumoku。。。如果我不按用户分组…我只会得到时间戳vs计数数据帧..用户信息丢失..对吗?那么,pivot如何能够基于用户数据呢?我在问题的查询中使用sumcount作为guven
+--------------------+----+-----+----+
| window| one|three| two|
+--------------------+----+-----+----+
|[2012-01-01 00:00...| 1| null| 1|
|[2012-01-01 00:04...|null| null| 1|
|[2012-01-01 00:02...| 1| 1|null|
+--------------------+----+-----+----+