Python 如何为Pyspark Dataframe列中最后一次出现的值设置标志 要求:当最后一次出现值为1的忠诚时,将标志设置为1,否则为0 输入: 所需输出: 注意:此处标志仅检查最后一个值是否包含1并设置标志。

Python 如何为Pyspark Dataframe列中最后一次出现的值设置标志 要求:当最后一次出现值为1的忠诚时,将标志设置为1,否则为0 输入: 所需输出: 注意:此处标志仅检查最后一个值是否包含1并设置标志。,python,sql,pyspark,window-functions,Python,Sql,Pyspark,Window Functions,我尝试的是: from pyspark.sql import functions as F from pyspark.sql.window import Window w2 = Window().partitionBy("consumer_id").orderBy('row_num') df = spark.sql("""select * from inter_table""") df = df.withColumn("Flag",F.when(F.last(F.col('loyal') ==

我尝试的是:

from pyspark.sql import functions as F
from pyspark.sql.window import Window
w2 = Window().partitionBy("consumer_id").orderBy('row_num')
df = spark.sql("""select * from inter_table""")
df = df.withColumn("Flag",F.when(F.last(F.col('loyal') == 1).over(w),1).otherwise(0))
这里有两种情况: 1.前面为0的值1(用于消费者id 11的参考行编号4) 2.值1不带前导项(对于消费者id 12的参考行_num3)尝试此操作

from pyspark.sql import functions as F
from pyspark.sql.window import Window

w = Window().partitionBy("product_id").orderBy('row_num')
df.withColumn("flag", F.when((F.col("loyal")==1)\
                             &(F.lead("loyal").over(w)==0),F.lit(1))\
                       .otherwise(F.lit(0))).show()

#+-----------+----------+----------+-------+-----+---------+-------+---+----+
#|consumer_id|product_id|    TRX_ID|pattern|loyal| trx_date|row_num| mx|flag|
#+-----------+----------+----------+-------+-----+---------+-------+---+----+
#|         11|         1|1152397078|  VVVVM|    1| 3/5/2020|      1|  5|   0|
#|         11|         1|1152944770|  VVVVV|    1| 3/6/2020|      2|  5|   0|
#|         11|         1|1153856408|  VVVVV|    1|3/15/2020|      3|  5|   0|
#|         11|         2|1155884040|  MVVVV|    1| 4/2/2020|      4|  5|   1|
#|         11|         2|1156854300|  MMVVV|    0|4/17/2020|      5|  5|   0|
#+-----------+----------+----------+-------+-----+---------+-------+---+----+
更新:

from pypsark.sql import functions as F
from pyspark.sql.window import Window

w = Window().partitionBy("consumer_id").orderBy('row_num')
lead=F.lead("loyal").over(w)
df.withColumn("Flag", F.when(((F.col("loyal")==1)\
                             &((lead==0)|(lead.isNull()))),F.lit(1))\
                       .otherwise(F.lit(0))).show()

加上Murtaza的回答

我们可以添加一个新的列来检查第二个场景中前面的null

window = Window.partitionBy('Consumer_id').orderBy('row_num')
df.withColumn('Flag',f.when((f.col('loyal')==1) 
                            & ((f.lead(f.col('loyal')).over(window)==0)
                              | (f.lead(f.col('loyal')).over(window).isNull())), f.lit('1')).otherwise(f.lit('0'))).show()

+-----------+----------+----------+-------+-----+---------+-------+---+----+
|consumer_id|product_id|    TRX_ID|pattern|loyal| trx_date|row_num| mx|Flag|
+-----------+----------+----------+-------+-----+---------+-------+---+----+
|         11|         1|1152397078|  VVVVM|    1| 3/5/2020|      1|  5|   0|
|         11|         1|1152944770|  VVVVV|    1| 3/6/2020|      2|  5|   0|
|         11|         1|1153856408|  VVVVV|    1|3/15/2020|      3|  5|   0|
|         11|         2|1155884040|  MVVVV|    1| 4/2/2020|      4|  5|   1|
|         11|         2|1156854300|  MMVVV|    0|4/17/2020|      5|  5|   0|
|         12|         1|1156854300|  VVVVM|    1| 3/6/2020|      1|  4|   0|
|         12|         1|1156854300|  VVVVV|    1| 3/7/2020|      2|  4|   0|
|         12|         2|1156854300|  MVVVV|    1|3/16/2020|      3|  4|   0|
|         12|         1|1156854300|  MVVVV|    1| 4/3/2020|      4|  4|   1|
+-----------+----------+----------+-------+-----+---------+-------+---+----+
from pypsark.sql import functions as F
from pyspark.sql.window import Window

w = Window().partitionBy("consumer_id").orderBy('row_num')
lead=F.lead("loyal").over(w)
df.withColumn("Flag", F.when(((F.col("loyal")==1)\
                             &((lead==0)|(lead.isNull()))),F.lit(1))\
                       .otherwise(F.lit(0))).show()
window = Window.partitionBy('Consumer_id').orderBy('row_num')
df.withColumn('Flag',f.when((f.col('loyal')==1) 
                            & ((f.lead(f.col('loyal')).over(window)==0)
                              | (f.lead(f.col('loyal')).over(window).isNull())), f.lit('1')).otherwise(f.lit('0'))).show()

+-----------+----------+----------+-------+-----+---------+-------+---+----+
|consumer_id|product_id|    TRX_ID|pattern|loyal| trx_date|row_num| mx|Flag|
+-----------+----------+----------+-------+-----+---------+-------+---+----+
|         11|         1|1152397078|  VVVVM|    1| 3/5/2020|      1|  5|   0|
|         11|         1|1152944770|  VVVVV|    1| 3/6/2020|      2|  5|   0|
|         11|         1|1153856408|  VVVVV|    1|3/15/2020|      3|  5|   0|
|         11|         2|1155884040|  MVVVV|    1| 4/2/2020|      4|  5|   1|
|         11|         2|1156854300|  MMVVV|    0|4/17/2020|      5|  5|   0|
|         12|         1|1156854300|  VVVVM|    1| 3/6/2020|      1|  4|   0|
|         12|         1|1156854300|  VVVVV|    1| 3/7/2020|      2|  4|   0|
|         12|         2|1156854300|  MVVVV|    1|3/16/2020|      3|  4|   0|
|         12|         1|1156854300|  MVVVV|    1| 4/3/2020|      4|  4|   1|
+-----------+----------+----------+-------+-----+---------+-------+---+----+