python中的ReduceByKey
Python中是否有一个函数与Spark(PySpark)中的python中的ReduceByKey,python,pyspark,Python,Pyspark,Python中是否有一个函数与Spark(PySpark)中的reduceByKey完全相同: 例如: a = [(1, ['a']), (1, ['b']), (2, ['c']), (2, ['d']), (3, ['e'])] 到 据我所知没有。但是自己写一本很容易 from collections import OrderedDict def reduce_by_key(ls): d = OrderedDict()
reduceByKey
完全相同:
例如:
a = [(1, ['a']),
(1, ['b']),
(2, ['c']),
(2, ['d']),
(3, ['e'])]
到
据我所知没有。但是自己写一本很容易
from collections import OrderedDict
def reduce_by_key(ls):
d = OrderedDict()
for key, sublist in ls:
d.setdefault(key, []).extend(sublist)
return list(d.items())
如果不需要保留顺序,可以使用常规的
dict
。没有。也许你能得到的最接近的东西是,尽管它有不同的语义(以流方式应用,所以它不假设关联性或交换性)。它还可以减少过完整的对象并返回字典:
list(reduceby(first, lambda x, y: (first(x), second(x) + second(y)), a).values())
## [(1, ['a', 'b']), (2, ['c', 'd']), (3, ['e'])]
from collections import OrderedDict
def reduce_by_key(ls):
d = OrderedDict()
for key, sublist in ls:
d.setdefault(key, []).extend(sublist)
return list(d.items())
list(reduceby(first, lambda x, y: (first(x), second(x) + second(y)), a).values())
## [(1, ['a', 'b']), (2, ['c', 'd']), (3, ['e'])]