Apache spark 如何在Pyspark windows函数中使用滞后和范围
我的数据如下所示Apache spark 如何在Pyspark windows函数中使用滞后和范围,apache-spark,pyspark,window-functions,Apache Spark,Pyspark,Window Functions,我的数据如下所示 +------------------+--------------------+----------------+ | out| timestamp|Sequence| +------------------+--------------------+----------------+ |0.5202757120132446|2019-11-07 00:00:...| 1| |
+------------------+--------------------+----------------+
| out| timestamp|Sequence|
+------------------+--------------------+----------------+
|0.5202757120132446|2019-11-07 00:00:...| 1|
| null|2019-11-07 00:00:...| 2|
| null|2019-11-07 00:00:...| 3|
| null|2019-11-07 00:00:...| 4|
|0.5220348834991455|2019-11-07 00:00:...| 5|
| 0.724998414516449|2019-11-07 00:00:...| 6|
| null|2019-11-07 00:00:...| 7|
| null|2019-11-07 00:00:...| 8|
|0.7322611212730408|2019-11-07 00:00:...| 9|
| null|2019-11-07 00:00:...| 10|
| null|2019-11-07 00:00:...| 11|
现在我想用前面的序列值替换空值。我正在使用windows函数来实现这一点,但我得到以下错误
'Window Frame RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW must match the required frame ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING;'
我的代码:
window1 =Window.partitionBy('timestamp').orderBy('Sequence').rangeBetween(Window.unboundedPreceding,0)
df = df.withColumn('out',F.when(F.col('out').isNull(),F.lag('out').over(window1)).otherwise(F.col('out')))
import sys
import pyspark.sql.functions as f
df.withColumn("newout", f.last('out', True).over(Window.partitionBy('timestamp').orderBy('sequence').rowsBetween(-sys.maxsize, 0))).show()
+------------------+--------------------+--------+------------------+
| out| timestamp|sequence| newout|
+------------------+--------------------+--------+------------------+
|0.5202757120132446|2019-11-07 00:00:...| 1|0.5202757120132446|
| null|2019-11-07 00:00:...| 2|0.5202757120132446|
| null|2019-11-07 00:00:...| 3|0.5202757120132446|
| null|2019-11-07 00:00:...| 4|0.5202757120132446|
|0.5220348834991455|2019-11-07 00:00:...| 5|0.5220348834991455|
| 0.724998414516449|2019-11-07 00:00:...| 6| 0.724998414516449|
| null|2019-11-07 00:00:...| 7| 0.724998414516449|
| null|2019-11-07 00:00:...| 8| 0.724998414516449|
|0.7322611212730408|2019-11-07 00:00:...| 9|0.7322611212730408|
| null|2019-11-07 00:00:...| 10|0.7322611212730408|
| null|2019-11-07 00:00:...| 11|0.7322611212730408|
+------------------+--------------------+--------+------------------+