Python 从UDF返回列表列表:构建ClassDict(对于numpy.core.multiarray.\u)需要零参数

Python 从UDF返回列表列表:构建ClassDict(对于numpy.core.multiarray.\u)需要零参数,python,apache-spark,pyspark,user-defined-functions,sqldatatypes,Python,Apache Spark,Pyspark,User Defined Functions,Sqldatatypes,我想输出作为map返回的两列,其中一列是浮点数的nd.array。在PySpark中,我无法将其转换为正确的返回类型 def get_vectors(feature_map): ids, inputs = zip(*[ (k, v) for d in feature_map for k, v in d.items() ]) #vectors object will be returned by another method, this is just d

我想输出作为map返回的两列,其中一列是浮点数的
nd.array
。在PySpark中,我无法将其转换为正确的返回类型

def get_vectors(feature_map):
    ids, inputs = zip(*[
        (k,  v) for d in feature_map for k, v in d.items()
    ])

   #vectors object will be returned by another method, this is just dummy code to simulate the data. It is an nd.array of floating point numbers
    vectors = []
    for item in inputs :
      vectors.append([1.0,2.0,3.0])
    vectors = np.array(vectors,float)
    return dict(zip(ids, list(vectors)))

gen_vectors_udf  = f.udf(get_vectors,t.MapType(t.StringType(),t.ArrayType(t.ArrayType(t.FloatType()))))
当我调用此udf时,出现以下错误

构造ClassDict(对于numpy.core.multiarray.\u重构)需要零参数

有人能帮助我理解如何将
nd.array
转换为PySpark类型吗

另一方面,如果我将
nd.array
转换为字符串列表,它似乎工作得非常好:

def get_vectors(feature_map):
    ids, inputs = zip(*[
        (k,  v) for d in feature_map for k, v in d.items()
    ])

    #vectors object will be returned by another method, this is just dummy code to simulate the data. It is an nd.array of floating point numbers
    vectors = []

    for item in inputs:
      vectors.append([1.0,2.0,3.0])
    vectors = np.array(vectors,float)
    output = [str(k) for k in vectors]
    return dict(zip(ids, output ))

gen_vectors_udf = f.udf(get_vectors,t.MapType(t.StringType(),t.StringType()))