Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/341.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
python中的ReduceByKey_Python_Pyspark - Fatal编程技术网

python中的ReduceByKey

python中的ReduceByKey,python,pyspark,Python,Pyspark,Python中是否有一个函数与Spark(PySpark)中的reduceByKey完全相同: 例如: a = [(1, ['a']), (1, ['b']), (2, ['c']), (2, ['d']), (3, ['e'])] 到 据我所知没有。但是自己写一本很容易 from collections import OrderedDict def reduce_by_key(ls): d = OrderedDict()

Python中是否有一个函数与Spark(PySpark)中的
reduceByKey
完全相同:

例如:

a = [(1, ['a']),
     (1, ['b']),   
     (2, ['c']),   
     (2, ['d']),   
     (3, ['e'])]


据我所知没有。但是自己写一本很容易

from collections import OrderedDict

def reduce_by_key(ls):
    d = OrderedDict()
    for key, sublist in ls:
        d.setdefault(key, []).extend(sublist)
    return list(d.items())

如果不需要保留顺序,可以使用常规的
dict

没有。也许你能得到的最接近的东西是,尽管它有不同的语义(以流方式应用,所以它不假设关联性或交换性)。它还可以减少过完整的对象并返回字典:

list(reduceby(first, lambda x, y: (first(x), second(x) + second(y)), a).values())
## [(1, ['a', 'b']), (2, ['c', 'd']), (3, ['e'])]
from collections import OrderedDict

def reduce_by_key(ls):
    d = OrderedDict()
    for key, sublist in ls:
        d.setdefault(key, []).extend(sublist)
    return list(d.items())
list(reduceby(first, lambda x, y: (first(x), second(x) + second(y)), a).values())
## [(1, ['a', 'b']), (2, ['c', 'd']), (3, ['e'])]