pyspark UDF中的窗口功能

pyspark UDF中的窗口功能,pyspark,user-defined-functions,windowing,Pyspark,User Defined Functions,Windowing,我无法使用窗口功能调用UDF from pyspark.sql.window import Window from pyspark.sql import functions as F mst=spark.createDataFrame([(1,"v1" ), (2,"v1"), (3,"v1" ),(21,"v2" ), (22,"v2"), (31,"v3" )], ["mst_id","mst_val"]) ref=spark.createDataFrame([(91,"v1" ), (92

我无法使用窗口功能调用UDF

from pyspark.sql.window import Window
from pyspark.sql import functions as F
mst=spark.createDataFrame([(1,"v1" ), (2,"v1"), (3,"v1" ),(21,"v2" ), (22,"v2"), (31,"v3" )], ["mst_id","mst_val"])
ref=spark.createDataFrame([(91,"v1" ), (92,"v2"), (93,"v3"  )], ["ref_id","ref_val"])

定义了一个简单的函数


def fnc1 (val):
    w=Window().partitionBy("mst_val").orderBy(F.col("mst_id").asc())
    mtch=mst.withColumn("rank",F.row_number().over(w)).filter((F.col("rank") == 1) & (F.col("mst_val") == F.lit(val))).rdd.collect()
    return (mtch[0]['mst_id'] if len(mtch) else -1)

fnc1("v3") 

收益率31

定义了一个简单的UDF

from pyspark.sql.functions import udf, col
from pyspark.sql.types import  *
udf1 = udf(lambda r: fnc1(r),IntegerType())
调用udf是错误的

ref.withColumn("abc",udf1(col("ref_val")))
给出错误:py4j.Py4JException:Method\uuuu getnewargs\uuuu([])不存在


有人能帮我吗。谢谢

看看这篇文章也许:看看这篇文章也许: