Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/306.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在pyspark中的pandas_udf函数中使用正则表达式?_Python_Pandas_Apache Spark_Pyspark_User Defined Functions - Fatal编程技术网

Python 如何在pyspark中的pandas_udf函数中使用正则表达式?

Python 如何在pyspark中的pandas_udf函数中使用正则表达式?,python,pandas,apache-spark,pyspark,user-defined-functions,Python,Pandas,Apache Spark,Pyspark,User Defined Functions,我的代码如下: @pandas_udf(BooleanType()) def is_one(iterator: Iterator[pd.Series]) -> Iterator[pd.Series]: for s in iterator: res = re.search("1", s) yield res != None df = spark.createDataFrame(pd.DataFrame(["1"

我的代码如下:

@pandas_udf(BooleanType())
def is_one(iterator: Iterator[pd.Series]) -> Iterator[pd.Series]:
    for s in iterator:
        res = re.search("1", s)
        yield res != None

df = spark.createDataFrame(pd.DataFrame(["1", "2", "3"], columns=["v"]))
df.select(is_one(df.v)).show()
我得到一个错误:

TypeError:应为字符串或类似字节的对象

看起来我的函数没有遍历字符串。为什么呢?我如何在pandas_udf中使用regex函数


我尝试了序列到序列的方法,但得到了相同的错误。

您可以在序列上使用
apply
来应用正则表达式搜索:

from pyspark.sql.functions import pandas_udf
from pyspark.sql.types import BooleanType
import pandas as pd

@pandas_udf(BooleanType())
def is_one(ser: pd.Series) -> pd.Series:
    return ser.apply(lambda s: re.search("1", s) is not None)
    # a neater way:
    # return ser.str.contains("1")

df = spark.createDataFrame(pd.DataFrame(["1", "2", "3"], columns=["v"]))

df.select(is_one(df.v)).show()
+---------+
|is_one(v)|
+---------+
|     true|
|    false|
|    false|
+---------+