Python pyarrow.lib.arrow未实现错误
我有一个包含4列的pyspark数据框,如下所示:Python pyarrow.lib.arrow未实现错误,python,python-3.x,pandas,pyspark,scikit-learn,Python,Python 3.x,Pandas,Pyspark,Scikit Learn,我有一个包含4列的pyspark数据框,如下所示: pyarrow version is 0.17.1 pandas version is 1.0.4 pyspark 2.3.4 我需要根据列“D”向数据帧添加另一列“prediction”。该模型预测一个数据帧,输入为单列,输出为numpy数组。因此,我编写了一个UDFs来实现它 A B C D ----- ----- ----- ----- 1 2 3
pyarrow version is 0.17.1
pandas version is 1.0.4
pyspark 2.3.4
我需要根据列“D”向数据帧添加另一列“prediction”。该模型预测一个数据帧,输入为单列,输出为numpy数组。因此,我编写了一个UDFs来实现它
A B C D
----- ----- ----- -----
1 2 3 My Name is cat
2 4 5 I like to code
但是,我发现以下错误,不确定问题出在哪里:
pyarrow.lib.arrow未实现错误:NumPyConverter未实现转换
任何帮助解决这个问题的人都将不胜感激。任何帮助解决这个问题的人都将不胜感激。
@pandas_udf(ArrayType(BooleanType()),PandasUDFType.SCALAR)
def t_func(pdf):
predict=classifier.predict(pdf)[:,1]
#Note: classifier.predict returns values as follows array([[0.706,0.293],[0.986,0.0713]])
#Note: predict values would look like as follows array([0.293],[0.0713])
#Note: type of predict is <class 'numpy.ndarray'>
predictions = predict > decision_threshhold
#Note: predictions values would look like as follows array([False,False])
#Note: type of predictions is <class 'numpy.ndarray'>
return pd.Series(predictions)
X = X.withColumn('prediction",t_func('D'))
A B C D prediction
----- ----- ----- ----- -------------
1 2 3 My Name is cat False
2 4 5 I like to code False