Python 在pySpark RDD中合并列表列表_Python_Apache Spark_Pyspark

Python 在pySpark RDD中合并列表列表

python apache-spark pyspark

Python 在pySpark RDD中合并列表列表,python,apache-spark,pyspark,Python,Apache Spark,Pyspark,我有一些元组列表，我想把它们合并成一个列表。我已经能够使用lambdas和列表理解来处理数据，接近使用reduceByKey，但不确定如何合并列表。所以格式 [[(0, 14), (0, 24)], [(1, 19), (1, 50)], ...] 我希望是这样 [(0, 14), (0, 24), (1, 19), (1, 50), ...] 把我带到我需要去的地方的代码 test = test.map(lambda x: (x[1], [e * local[x[1]] for e in

我有一些元组列表，我想把它们合并成一个列表。我已经能够使用lambdas和列表理解来处理数据，接近使用reduceByKey，但不确定如何合并列表。所以格式

[[(0, 14), (0, 24)], [(1, 19), (1, 50)], ...]

我希望是这样

[(0, 14), (0, 24), (1, 19), (1, 50), ...]

把我带到我需要去的地方的代码

test = test.map(lambda x: (x[1], [e * local[x[1]] for e in x[0]]))
test = test.map(lambda x: [(x[0], y) for y in x[1]])

但不确定如何合并列表

感谢@mrsrinivas提供的提示

test=test.flatMap（λxs:[（x[0]，x[1]），对于x-in-xs]）

您可以这样做

test = test.flatMap(identity)

或

使用

展平

而不是手动分解.Hmm。好吧，我考虑过，但出于某种原因，我不认为这是一种方式。我来看看你甚至可以做

test.flatMap（identity）

检查这个什么是identity？

test = test.flatMap(lambda list: list)