Python 如何将可变数量的变量传递到pyspark select表达式中_Python_Python 3.x_Pyspark

Python 如何将可变数量的变量传递到pyspark select表达式中

python python-3.x pyspark

Python 如何将可变数量的变量传递到pyspark select表达式中,python,python-3.x,pyspark,Python,Python 3.x,Pyspark,我有一个简单的pyspark函数 features=['x', 'y', 'z'] def f(features): df.groupBy('id').agg(collect_list(features[0]), collect_list(features[1]), ....) 我希望这样，如果有人传入features=['x'，'y'，'z'，'a']，features中的每个东西在agg函数中都有自己的collect_list函数。我该怎么做？它们都必须处于相同的agg函数中 fe

我有一个简单的pyspark函数

features=['x', 'y', 'z']
def f(features):
    df.groupBy('id').agg(collect_list(features[0]), collect_list(features[1]), ....)

我希望这样，如果有人传入features=['x'，'y'，'z'，'a']，features中的每个东西在agg函数中都有自己的collect_list函数。我该怎么做？它们都必须处于相同的agg函数中

features=['x', 'y', 'z']
def f(features):
    df.groupBy("id").agg(*[collect_list(feature) for feature in features ])

features

数组元素将在

agg

函数中迭代，并为每个功能创建一个聚合列

要为聚合列派生自定义列名

df.groupBy("id").agg(*[F.collect_list(feature).alias("%s_list" % (feature)) for feature in features ])

请参阅此以了解更多详细信息

features

数组元素将在

agg

函数中迭代，并为每个功能创建一个聚合列

要为聚合列派生自定义列名

df.groupBy("id").agg(*[F.collect_list(feature).alias("%s_list" % (feature)) for feature in features ])

有关更多详细信息，请参阅此页。

谢谢您的回答！请尝试在回答中包含有关代码的详细信息。谢谢谢谢你的回答！请尝试在回答中包含有关代码的详细信息。谢谢