Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 当iterable由数百万个元素组成时,是否有zip(*iterable)的替代方案?_Python_Python 3.x_Optimization_Iterable Unpacking - Fatal编程技术网

Python 当iterable由数百万个元素组成时,是否有zip(*iterable)的替代方案?

Python 当iterable由数百万个元素组成时,是否有zip(*iterable)的替代方案?,python,python-3.x,optimization,iterable-unpacking,Python,Python 3.x,Optimization,Iterable Unpacking,我遇到过这样的代码: from random import randint class Point: def __init__(self, x, y): self.x = x self.y = y points = [Point(randint(1, 10), randint(1, 10)) for _ in range(10)] xs = [point.x for point in points] ys = [point.y for point in

我遇到过这样的代码:

from random import randint

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

points = [Point(randint(1, 10), randint(1, 10)) for _ in range(10)]
xs = [point.x for point in points]
ys = [point.y for point in points]
xs, ys = zip(*[(point.x, point.y) for point in p])
我认为这段代码不是Pythonic,因为它会重复自身。如果将另一个维度添加到
类,则需要编写一个全新的循环,如下所示:

zs = [point.z for point in points]
所以我试着通过写这样的东西来让它更像蟒蛇:

from random import randint

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

points = [Point(randint(1, 10), randint(1, 10)) for _ in range(10)]
xs = [point.x for point in points]
ys = [point.y for point in points]
xs, ys = zip(*[(point.x, point.y) for point in p])
如果添加了新维度,则没有问题:

xs, ys, zs = zip(*[(point.x, point.y, point.z) for point in p])
但是,当有数百万个点时,这几乎比另一个解决方案慢10倍,尽管它只有一个循环。我认为这是因为
*
操作符需要将数百万个参数解压到
zip
函数中,这太可怕了。所以我的问题是:


是否有办法更改上面的代码,使其与以前一样快(不使用第三方库)?

我刚刚测试了几种压缩
点坐标的方法,并随着点数的增加寻找它们的性能

以下是我用来测试的功能:

def硬编码(点):
#对每个坐标的手工理解
返回[point.x表示点内点],[point.y表示点内点]
使用_-zip的def(点数):
#使用“有问题的”qip函数
返回zip(*(点中点的点x、点y))
def循环_和_理解(点):
#从坐标名称列表中进行理解
压缩=[]
对于('x','y')中的坐标:
append([getattr(点,坐标)表示点中的点])
回程拉链
理解力(分数):
#使用嵌套的
#理解
返回[
[getattr(点,坐标)表示点中的点]
对于('x','y'中的坐标)
]
使用timeit,我用不同的点数计时每个函数的执行,结果如下:

comparing processing times using 10 points and 10000000 iterations
hardcode................. 14.12024447 [+0%]
using_zip................ 16.84289724 [+19%]
loop_and_comprehension... 30.83631476 [+118%]
nested_comprehension..... 30.45758349 [+116%]

comparing processing times using 100 points and 1000000 iterations
hardcode................. 9.30594717 [+0%]
using_zip................ 13.74953714 [+48%]
loop_and_comprehension... 19.46766583 [+109%]
nested_comprehension..... 19.27818860 [+107%]

comparing processing times using 1000 points and 100000 iterations
hardcode................. 7.90372457 [+0%]
using_zip................ 12.51523594 [+58%]
loop_and_comprehension... 18.25679913 [+131%]
nested_comprehension..... 18.64352790 [+136%]

comparing processing times using 10000 points and 10000 iterations
hardcode................. 8.27348382 [+0%]
using_zip................ 18.23079485 [+120%]
loop_and_comprehension... 18.00183383 [+118%]
nested_comprehension..... 17.96230063 [+117%]

comparing processing times using 100000 points and 1000 iterations
hardcode................. 9.15848662 [+0%]
using_zip................ 22.70730675 [+148%]
loop_and_comprehension... 17.81126971 [+94%]
nested_comprehension..... 17.86892597 [+95%]

comparing processing times using 1000000 points and 100 iterations
hardcode................. 9.75002857 [+0%]
using_zip................ 23.13891725 [+137%]
loop_and_comprehension... 18.08724660 [+86%]
nested_comprehension..... 18.01269820 [+85%]

comparing processing times using 10000000 points and 10 iterations
hardcode................. 9.96045920 [+0%]
using_zip................ 23.11653558 [+132%]
loop_and_comprehension... 17.98296033 [+81%]
nested_comprehension..... 18.17317708 [+82%]

comparing processing times using 100000000 points and 1 iterations
hardcode................. 64.58698246 [+0%]
using_zip................ 92.53437881 [+43%]
loop_and_comprehension... 73.62493845 [+14%]
nested_comprehension..... 62.99444739 [-2%]

我们可以看到,“harcoded”解决方案与使用
getattr
构建理解的解决方案之间的差距似乎随着点数的增加而不断缩小

因此,对于大量的点,最好使用从坐标列表生成的理解:

[[getattr(point, coordinate) for point in points]
 for coordinate in ('x', 'y')]
然而,对于少数几点来说,这是最糟糕的解决方案(至少从我测试的结果来看)


以下是我用来运行此基准测试的代码:

import timeit
...
def比较(nb_点,nb_迭代):
参考=无
点=[点(randint(1100),randint(1100))
对于范围内的(nb_点)]
打印(“使用{}点和{}迭代比较处理时间”
.格式(nb_点,nb_迭代))
对于func in(硬编码,使用\u zip、循环\u和\u理解、嵌套\u理解):
duration=timeit.timeit(lambda:func(点),number=nb\u迭代次数)
打印(“{:.+.0%}]”
.格式(函数名称、持续时间、,
0(如果引用不是其他引用(持续时间/引用-1)))
如果引用为无:
参考=持续时间
打印(“-”*80)
比较(101000)
比较(1001000000)
比较(1000,100000)
比较(10000,10000)
比较(100000,1000)
比较(1000000100)
比较(10000000,10)
比较(100000000,1)
zip(*iter)
的问题是它将迭代整个iterable,并将结果序列作为参数传递给zip

因此,它们在功能上是相同的:

使用*:
xs,ys=zip(*[(p.x,p.y)表示p in((0,1)、(0,2)、(0,3)))

使用位置:
xz,ys=zip((0,1)、(0,2)、(0,3))

显然,如果有数以百万计的位置参数,这将是缓慢的

迭代器方法是唯一的解决方法

我在网上搜索了
pythonitertools解压
。不幸的是,
itertools
得到的最接近的是
tee
。在指向上述要点的链接中,来自
itertools.tee
的迭代器元组从
iunzip
的这个实现返回:

我不得不将其转换为python3:

from random import randint
import itertools
import time
from operator import itemgetter

def iunzip(iterable):
    """Iunzip is the same as zip(*iter) but returns iterators, instead of 
    expand the iterator. Mostly used for large sequence"""

    _tmp, iterable = itertools.tee(iterable, 2)
    iters = itertools.tee(iterable, len(next(_tmp)))
    return (map(itemgetter(i), it) for i, it in enumerate(iters))

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

points = [Point(randint(1, 10), randint(1, 10)) for _ in range(1000000)]
itime = time.time()
xs = [point.x for point in points]
ys = [point.y for point in points]
otime = time.time() - itime
itime += otime
print(f"original: {otime}")
xs, ys = zip(*[(p.x, p.y) for p in points])
otime = time.time() - itime
itime += otime
print(f"unpacking into zip: {otime}")
xs, ys = iunzip(((p.x, p.y) for p in points))
for _ in zip(xs, ys): pass
otime = time.time() - itime
itime += otime
print(f"iunzip: {otime}")
original: 0.1282501220703125
unpacking into zip: 1.286362886428833
iunzip: 0.3046858310699463
输出:

from random import randint
import itertools
import time
from operator import itemgetter

def iunzip(iterable):
    """Iunzip is the same as zip(*iter) but returns iterators, instead of 
    expand the iterator. Mostly used for large sequence"""

    _tmp, iterable = itertools.tee(iterable, 2)
    iters = itertools.tee(iterable, len(next(_tmp)))
    return (map(itemgetter(i), it) for i, it in enumerate(iters))

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

points = [Point(randint(1, 10), randint(1, 10)) for _ in range(1000000)]
itime = time.time()
xs = [point.x for point in points]
ys = [point.y for point in points]
otime = time.time() - itime
itime += otime
print(f"original: {otime}")
xs, ys = zip(*[(p.x, p.y) for p in points])
otime = time.time() - itime
itime += otime
print(f"unpacking into zip: {otime}")
xs, ys = iunzip(((p.x, p.y) for p in points))
for _ in zip(xs, ys): pass
otime = time.time() - itime
itime += otime
print(f"iunzip: {otime}")
original: 0.1282501220703125
unpacking into zip: 1.286362886428833
iunzip: 0.3046858310699463
因此,迭代器肯定比解包到位置参数要好。更不用说当我达到1000万点时,我的4GB内存被消耗殆尽。。。但是,我不相信上面的
iunzip
迭代器是python内置的最佳迭代器,因为在“原始”方法中迭代两次进行解压缩仍然是最快的(使用不同长度的点进行尝试时大约快4倍)


似乎iunzip应该是一件事。我很惊讶它不是python内置的或itertools的一部分…

首先,你可以使用生成器而不是构建一个完整的列表:
zip(*(点x,点y,点z代表点in p))
。与完全使用另一种方法相比,这会有多大的帮助,我无法马上说出来。@deceze我不知道为什么,但它更慢。@deceze:那根本没有帮助。参数解包总是将它转换成一个
元组
,不管它是什么,所以它只是使用一个更昂贵的生成器表达式来填充
元组
,而不是一个更便宜的listcomp,然后是快速浅层复制。@ShadowRanger,我明白了,这就解释了它,谢谢。@Tryph当然会更快,但我认为这是作弊:)我可以用C写这段代码,速度会快5倍。我正在努力理解为什么它很慢,以及如何改进它。