Apache spark Spark,配置单元SQL-实现窗口功能?
我正在尝试实施以下解决方案: 我有以下建议:Apache spark Spark,配置单元SQL-实现窗口功能?,apache-spark,hive,apache-spark-sql,window-functions,Apache Spark,Hive,Apache Spark Sql,Window Functions,我正在尝试实施以下解决方案: 我有以下建议: +------------+----------------------+-------------------+ |increment_id|base_subtotal_incl_tax| eventdate| +------------+----------------------+--------
+------------+----------------------+-------------------+
|increment_id|base_subtotal_incl_tax| eventdate|
+------------+----------------------+-------------------+
| 1086| 14470.0000|2016-06-14 09:54:12|
| 1086| 14470.0000|2016-06-14 09:54:12|
| 1086| 14470.0000|2015-07-14 09:54:12|
| 1086| 14470.0000|2015-07-14 09:54:12|
| 1086| 14470.0000|2015-07-14 09:54:12|
| 1086| 14470.0000|2015-07-14 09:54:12|
| 1086| 1570.0000|2015-07-14 09:54:12|
| 5555| 14470.0000|2014-07-14 09:54:12|
| 5555| 14470.0000|2014-07-14 09:54:12|
| 5555| 14470.0000|2014-07-14 09:54:12|
| 5555| 14470.0000|2014-07-14 09:54:12|
+------------+----------------------+-------------------+
我正在尝试以以下方式运行窗口函数:
WindowSpec window = Window.partitionBy(df.col("id")).orderBy(df.col("eventdate").desc());
df.select(df.col("*"),rank().over(window).alias("rank")) //error for this line
.filter("rank <= 2")
.show();
但我明白了:
+------------+----------------------+-------------------+----+
|increment_id|base_subtotal_incl_tax| eventdate|rank|
+------------+----------------------+-------------------+----+
| 5555| 14470.0000|2014-07-14 09:54:12| 1|
| 5555| 14470.0000|2014-07-14 09:54:12| 1|
| 5555| 14470.0000|2014-07-14 09:54:12| 1|
| 5555| 14470.0000|2014-07-14 09:54:12| 1|
| 1086| 14470.0000|2016-06-14 09:54:12| 1|
| 1086| 14470.0000|2016-06-14 09:54:12| 1|
+------------+----------------------+-------------------+----+
我错过了什么 所有值都相等->等级相等。请尝试
行号
:
df.select(df.col("*"),row_number().over(window).alias("rank"))
.filter("rank <= 2")
.show();
df.select(df.col(“*”),行号()
.filter(“秩所有值均相等->秩均相等。请尝试行数
:
df.select(df.col("*"),row_number().over(window).alias("rank"))
.filter("rank <= 2")
.show();
df.select(df.col(“*”),行号()
.filter(“rank Thank!这非常有效!因此原始版本似乎也适用于实际数据,因为时间戳会有所不同。:)谢谢!这非常有效!因此原始版本似乎也适用于实际数据,因为时间戳会有所不同。:)