Python 从UDF返回列表列表:构建ClassDict(对于numpy.core.multiarray.\u)需要零参数
我想输出作为map返回的两列,其中一列是浮点数的Python 从UDF返回列表列表:构建ClassDict(对于numpy.core.multiarray.\u)需要零参数,python,apache-spark,pyspark,user-defined-functions,sqldatatypes,Python,Apache Spark,Pyspark,User Defined Functions,Sqldatatypes,我想输出作为map返回的两列,其中一列是浮点数的nd.array。在PySpark中,我无法将其转换为正确的返回类型 def get_vectors(feature_map): ids, inputs = zip(*[ (k, v) for d in feature_map for k, v in d.items() ]) #vectors object will be returned by another method, this is just d
nd.array
。在PySpark中,我无法将其转换为正确的返回类型
def get_vectors(feature_map):
ids, inputs = zip(*[
(k, v) for d in feature_map for k, v in d.items()
])
#vectors object will be returned by another method, this is just dummy code to simulate the data. It is an nd.array of floating point numbers
vectors = []
for item in inputs :
vectors.append([1.0,2.0,3.0])
vectors = np.array(vectors,float)
return dict(zip(ids, list(vectors)))
gen_vectors_udf = f.udf(get_vectors,t.MapType(t.StringType(),t.ArrayType(t.ArrayType(t.FloatType()))))
当我调用此udf时,出现以下错误
构造ClassDict(对于numpy.core.multiarray.\u重构)需要零参数
有人能帮助我理解如何将nd.array
转换为PySpark类型吗
另一方面,如果我将nd.array
转换为字符串列表,它似乎工作得非常好:
def get_vectors(feature_map):
ids, inputs = zip(*[
(k, v) for d in feature_map for k, v in d.items()
])
#vectors object will be returned by another method, this is just dummy code to simulate the data. It is an nd.array of floating point numbers
vectors = []
for item in inputs:
vectors.append([1.0,2.0,3.0])
vectors = np.array(vectors,float)
output = [str(k) for k in vectors]
return dict(zip(ids, output ))
gen_vectors_udf = f.udf(get_vectors,t.MapType(t.StringType(),t.StringType()))