Python 使用sum（）连接元组_Python_Sum_Tuples_Itertools

Python 使用sum（）连接元组

python

Python 使用sum（）连接元组,python,sum,tuples,itertools,Python,Sum,Tuples,Itertools,从中我了解到，可以将元组与以下内容连接： >>> tuples = (('hello',), ('these', 'are'), ('my', 'tuples!')) >>> sum(tuples, ()) ('hello', 'these', 'are', 'my', 'tuples!') 看起来很不错。但这为什么有效呢？那么，这是最优的，还是来自itertools的某些东西比这个构造更可取？加法运算符在python中连接元组： ('a', 'b')+('

从中我了解到，可以将元组与以下内容连接：

>>> tuples = (('hello',), ('these', 'are'), ('my', 'tuples!'))
>>> sum(tuples, ())
('hello', 'these', 'are', 'my', 'tuples!')

看起来很不错。但这为什么有效呢？那么，这是最优的，还是来自

itertools

的某些东西比这个构造更可取？

加法运算符在python中连接元组：

('a', 'b')+('c', 'd')
Out[34]: ('a', 'b', 'c', 'd')

从

sum

的文档字符串：

返回一个“开始”值（默认值：0）加上一个iterable的和数字

这意味着

sum

不是从iterable的第一个元素开始，而是从通过

start=

参数传递的初始值开始

默认情况下，

sum

与numeric一起使用，因此默认起始值为

。因此，对元组的iterable求和需要从空元组开始<代码>（）是一个空元组：

type(())
Out[36]: tuple

因此，工作级联

根据性能，这里有一个比较：

%timeit sum(tuples, ())
The slowest run took 9.40 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 285 ns per loop


%timeit tuple(it.chain.from_iterable(tuples))
The slowest run took 5.00 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 625 ns per loop

现在t2的尺寸为10000：

%timeit sum(t2, ())
10 loops, best of 3: 188 ms per loop

%timeit tuple(it.chain.from_iterable(t2))
1000 loops, best of 3: 526 µs per loop

所以，如果元组列表很小，就不用麻烦了。如果它是中等大小或更大，你应该使用

itertools

，这很聪明，我不得不笑，因为帮助明确禁止字符串，但它可以工作

sum(...)
    sum(iterable[, start]) -> value

    Return the sum of an iterable of numbers (NOT strings) plus the value
    of parameter 'start' (which defaults to 0).  When the iterable is
    empty, return start.

您可以添加元组以获得新的、更大的元组。由于您给出了一个元组作为起始值，因此加法是有效的。

只是为了用更多的基准来补充已接受的答案：

import functools, operator, itertools
import numpy as np
N = 10000
M = 2

ll = tuple(tuple(x) for x in np.random.random((N, M)).tolist())

%timeit functools.reduce(operator.add, ll)
# 407 ms ± 5.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit functools.reduce(lambda x, y: x + y, ll)
# 425 ms ± 7.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit sum(ll, ())
# 426 ms ± 14.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit tuple(itertools.chain(*ll))
# 601 µs ± 5.43 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit tuple(itertools.chain.from_iterable(ll))
# 546 µs ± 25.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

编辑：更新代码以实际使用元组。而且，根据注释，最后两个选项现在位于

tuple（）

构造函数中，并且所有时间都已更新（为了一致性）。

itertools.chain*

选项仍然是最快的，但现在边距减少了。

它之所以有效，是因为加法（在元组上）会重载以返回连接的元组：

>>> () + ('hello',) + ('these', 'are') + ('my', 'tuples!')
('hello', 'these', 'are', 'my', 'tuples!')

()
('hello',)
('hello', 'these', 'are')
('hello', 'these', 'are', 'my', 'tuples!')

这基本上就是

sum

所做的，您给出一个空元组的初始值，然后将元组添加到该值中

但是，这通常不是一个好主意，因为添加元组会创建一个新的元组，因此创建几个中间元组只是为了将它们复制到连接的元组中：

>>> () + ('hello',) + ('these', 'are') + ('my', 'tuples!')
('hello', 'these', 'are', 'my', 'tuples!')

()
('hello',)
('hello', 'these', 'are')
('hello', 'these', 'are', 'my', 'tuples!')

这是一个具有二次运行时行为的实现。通过避免中间元组，可以避免二次运行时行为

>>> tuples = (('hello',), ('these', 'are'), ('my', 'tuples!'))

使用嵌套生成器表达式：

>>> tuple(tuple_item for tup in tuples for tuple_item in tup)
('hello', 'these', 'are', 'my', 'tuples!')

或使用生成器功能：

def flatten(it):
    for seq in it:
        for item in seq:
            yield item


>>> tuple(flatten(tuples))
('hello', 'these', 'are', 'my', 'tuples!')

或使用：

如果您对这些性能感兴趣（使用my）：

（Python 3.7.2 64位，Windows 10 64位）

因此，虽然如果只连接几个元组，那么

sum

方法非常快，但是如果尝试连接很多元组，那么速度会非常慢。对于许多元组来说，最快的测试方法是

为什么它不能工作？它只是将元组相加，但效率不高。看一看。例如，

元组（chain（*tuples））

@PM2Ring。避免像那样使用

chain

，因为它比

sum

效率更低（除非元组集合非常小）。请改用

chain.from\u iterable

。@ekhumoroops！是的，连锁店。从这里开始比较好。正如Boud的回答所示，对于小的元组集合，它实际上比求和慢。有趣的计时。您使用的是哪种Python版本？@PM2Ring 3.5 64位

最好的3

=>请参考IPythoni中的%timeit文档在本例中，

sum

没有对字符串进行求和：在这里连接的输入中没有两个分开的字符串。（例如，无法使用

sum

将

hello

和

world

转换为

helloworld

）我认为Python所做的只是愚蠢。Sum应该能够对支持

运算符的任何内容求和。字符串可以。以性能和良好约定的名义明确禁止字符串的这种特殊情况（而python有许多其他不允许的反模式）是不好的design@ShreevatsaR我很清楚这一点。帮助中提到了字符串，但我接着说这实际上是在添加元组。我只是觉得这很有趣，并且假设人们可以阅读。@progo-我不知道为什么它是被禁止的，但我同意，sum应该做plus做的事情。可能是为了捕捉字符串被误认为int的常见错误。但是，关于弦的部分，请参见。主要是效率。你最后两次的时间安排不具代表性。

itertools.chain

和

itertools.chain.from\u iterable

返回迭代器。对于公平计时，您需要使用

元组（itertools.chain…

来迭代这些。