Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x pyspark3.0.0中的pandas_udf出现意外错误_Python 3.x_Pyspark_Apache Spark Sql - Fatal编程技术网

Python 3.x pyspark3.0.0中的pandas_udf出现意外错误

Python 3.x pyspark3.0.0中的pandas_udf出现意外错误,python-3.x,pyspark,apache-spark-sql,Python 3.x,Pyspark,Apache Spark Sql,我遵循了中的示例,但它出错了,我的代码如下所示: import pandas as pd from pyspark import SparkConf, SparkContext from pyspark.sql import SparkSession from pyspark.sql.pandas.functions import pandas_udf class SparkBase(object): def __init__(self, master="local[*]

我遵循了中的示例,但它出错了,我的代码如下所示:

import pandas as pd

from pyspark import SparkConf, SparkContext
from pyspark.sql import SparkSession
from pyspark.sql.pandas.functions import pandas_udf


class SparkBase(object):
    def __init__(self, master="local[*]", app_name="SparkBase"):
        _conf = SparkConf().setMaster(master).setAppName(app_name)
        _conf.set("spark.sql.execution.arrow.pyspark.enabled", True)
        _conf.set("spark.sql.execution.arrow.pyspark.fallback.enabled", True)
        self.sc = SparkContext().getOrCreate(conf=_conf)
        self.spark = SparkSession.builder.config(conf=_conf).enableHiveSupport().getOrCreate()


@pandas_udf("col1 string, col2 long")
def func(s1: pd.Series, s2: pd.Series, s3: pd.DataFrame) -> pd.DataFrame:
    s3["col2"] = s1 + s2.str.len()
    return s3


if __name__ == "__main__":
    spark_base = SparkBase()
    df = spark_base.spark.createDataFrame([[1, "a string", ("a nested string",)]],
                                          "long_c long, str_c string, struct_c struct<col1: string>")
    df.show()
将熊猫作为pd导入
从pyspark导入SparkConf,SparkContext
从pyspark.sql导入SparkSession
从pyspark.sql.pandas.functions导入pandas\u udf
类SparkBase(对象):
定义初始化(self,master=“local[*]”,app_name=“SparkBase”):
_conf=SparkConf().setMaster(master).setAppName(app_name)
_conf.set(“spark.sql.execution.arrow.pyspark.enabled”,True)
_conf.set(“spark.sql.execution.arrow.pyspark.fallback.enabled”,True)
self.sc=SparkContext().getOrCreate(conf=_conf)
self.spark=SparkSession.builder.config(conf=_conf).enablehavesupport().getOrCreate()
@pandas_udf(“col1字符串,col2长”)
def func(s1:pd.Series,s2:pd.Series,s3:pd.DataFrame)->pd.DataFrame:
s3[“col2”]=s1+s2.str.len()
返回s3
如果名称=“\uuuuu main\uuuuuuuu”:
spark_base=SparkBase()
df=spark_base.spark.createDataFrame([[1,“一个字符串”(“一个嵌套字符串”)]),
“long_c long,str_c string,struct_c struct”)
df.show()
错误代码:

Traceback (most recent call last):
  File "F:/otherproj/localpyspark/pyspark3/sparkbase.py", line 24, in <module>
    def func(s1: pd.Series, s2: pd.Series, s3: pd.DataFrame) -> pd.DataFrame:
  File "D:\ProgramData\Anaconda3\lib\site-packages\pyspark\sql\pandas\functions.py", line 426, in _create_pandas_udf
    return _create_udf(f, returnType, evalType)
  File "D:\ProgramData\Anaconda3\lib\site-packages\pyspark\sql\udf.py", line 43, in _create_udf
    return udf_obj._wrapped()
  File "D:\ProgramData\Anaconda3\lib\site-packages\pyspark\sql\udf.py", line 204, in _wrapped
    wrapper.returnType = self.returnType
  File "D:\ProgramData\Anaconda3\lib\site-packages\pyspark\sql\udf.py", line 94, in returnType
    self._returnType_placeholder = _parse_datatype_string(self._returnType)
  File "D:\ProgramData\Anaconda3\lib\site-packages\pyspark\sql\types.py", line 822, in _parse_datatype_string
    raise e
  File "D:\ProgramData\Anaconda3\lib\site-packages\pyspark\sql\types.py", line 812, in _parse_datatype_string
    return from_ddl_schema(s)
  File "D:\ProgramData\Anaconda3\lib\site-packages\pyspark\sql\types.py", line 804, in from_ddl_schema
    sc._jvm.org.apache.spark.sql.types.StructType.fromDDL(type_str).json())
AttributeError: 'NoneType' object has no attribute '_jvm'
回溯(最近一次呼叫最后一次):
文件“F:/otherproj/localpyspark/pyspark3/sparkbase.py”,第24行,在
def func(s1:pd.Series,s2:pd.Series,s3:pd.DataFrame)->pd.DataFrame:
文件“D:\ProgramData\Anaconda3\lib\site packages\pyspark\sql\pandas\functions.py”,第426行,位于\u create\u pandas\u udf中
return\u create\u udf(f,returnType,evalType)
文件“D:\ProgramData\Anaconda3\lib\site packages\pyspark\sql\udf.py”,第43行,在创建udf
返回udf_obj._wrapped()
文件“D:\ProgramData\Anaconda3\lib\site packages\pyspark\sql\udf.py”,第204行,用_包装
wrapper.returnType=self.returnType
文件“D:\ProgramData\Anaconda3\lib\site packages\pyspark\sql\udf.py”,第94行,返回类型
self.\u returnType\u占位符=\u parse\u datatype\u string(self.\u returnType)
文件“D:\ProgramData\Anaconda3\lib\site packages\pyspark\sql\types.py”,第822行,在\u parse\u datatype\u字符串中
提高e
文件“D:\ProgramData\Anaconda3\lib\site packages\pyspark\sql\types.py”,第812行,位于\u parse\u datatype\u字符串中
从_ddl_模式返回
文件“D:\ProgramData\Anaconda3\lib\site packages\pyspark\sql\types.py”,第804行,在from\u ddl\u模式中
sc._jvm.org.apache.spark.sql.types.StructType.fromdll(type_str.json())
AttributeError:“非类型”对象没有属性“\u jvm”
如果我注释
func
函数,它可以成功运行。哪里出了问题?它是spark3.0.0中的一个bug吗