pyspark PandasudType.SCALAR convert行数组有错误
我想使用PandasUDFDType.SCALAR操作行数组,如下所示:pyspark PandasudType.SCALAR convert行数组有错误,pyspark,series,pyspark-dataframes,Pyspark,Series,Pyspark Dataframes,我想使用PandasUDFDType.SCALAR操作行数组,如下所示: df = spark.createDataFrame([([1, 2, 3, 2],), ([4, 5, 5, 4],)], ['data']) @pandas_udf(ArrayType(IntegerType()), PandasUDFType.SCALAR) def s(x): z = x.apply(lambda xx: xx*2) return z df.select(s(df.data)).s
df = spark.createDataFrame([([1, 2, 3, 2],), ([4, 5, 5, 4],)], ['data'])
@pandas_udf(ArrayType(IntegerType()), PandasUDFType.SCALAR)
def s(x):
z = x.apply(lambda xx: xx*2)
return z
df.select(s(df.data)).show()
但它出了问题:
pyarrow.lib.ArrowInvalid: trying to convert NumPy type int32 but got int64```
同样的代码适用于我你的pandas、spark、pyarrow和numpy的哪个版本?
('0.25.2'、'2.4.4'、'0.13.0'、'1.16.3')
的顺序相同这可能是Pyarow的原因,我已经用Pyarow 0.13.0替换了Pyarow 0.8.0,成功了!