按Pyspark中的顺序按_键分组

按Pyspark中的顺序按_键分组,pyspark,rdd,Pyspark,Rdd,下面是测试: rrr = sc.parallelize([1, 2, 3]) fff = sc.parallelize([5, 6, 7, 8]) test = rrr.cartesian(fff) 调用groupByKey后是否有办法保留订单: [(1, 5),(1, 6),(1, 7),(1, 8), (2, 5),(2, 6),(2, 7),(2, 8), (3, 5),(3, 6),(3, 7),(3, 8)] 当列表按随机顺序排列时,输出为: Out[255]:[(1[8,5,

下面是
测试

rrr = sc.parallelize([1, 2, 3])
fff = sc.parallelize([5, 6, 7, 8])
test = rrr.cartesian(fff)
调用
groupByKey
后是否有办法保留订单:

[(1, 5),(1, 6),(1, 7),(1, 8),
 (2, 5),(2, 6),(2, 7),(2, 8),
 (3, 5),(3, 6),(3, 7),(3, 8)]
当列表按随机顺序排列时,输出为:
Out[255]:[(1[8,5,6,7]),(2[5,8,6,7]),(3[6,8,7,5])]

所需输出为:

test.groupByKey().mapValues(list).take(2)

如何实现这一点?

您可以再添加一个
mapValues
来对列表进行排序:

[(1, [5,6,7,8]), (2, [5,6,7,8]), (3, [5,6,7,8])]
result = test.groupByKey().mapValues(list).mapValues(sorted)