Python itertools.groupby的意外行为_Python_Python 3.x_Python 2.x_Itertools_Python Internals

Python itertools.groupby的意外行为

python python-3.x

Python itertools.groupby的意外行为,python,python-3.x,python-2.x,itertools,python-internals,Python,Python 3.x,Python 2.x,Itertools,Python Internals,这是观察到的行为： In [4]: x = itertools.groupby(range(10), lambda x: True) In [5]: y = next(x) In [6]: next(x) --------------------------------------------------------------------------- StopIteration Traceback (most recent call

这是观察到的行为：

In [4]: x = itertools.groupby(range(10), lambda x: True)

In [5]: y = next(x)

In [6]: next(x)
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-6-5e4e57af3a97> in <module>()
----> 1 next(x)

StopIteration: 

In [7]: y
Out[7]: (True, <itertools._grouper at 0x10a672e80>)

In [8]: list(y[1])
Out[8]: [9]

问题是，您将所有这些调用分组为一个组，因此在第一次

下一次调用后，所有调用都已分组：
import itertools
x = itertools.groupby(range(10), lambda x: True)
key, elements = next(x)

但是元素
是一个生成器，因此您需要立即将其传递到某个结构中，以便于“打印”或“保存”，即列表
：
print('Key: "{}" with value "{}"'.format(key, list(elements)))

然后您的范围（10）
为空，groupy生成器完成：
Key: True with value [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

文件表明
itertools.groupby（iterable，key=None）

[……]
groupby（）
的操作类似于Unix中的uniq筛选器。每次键函数的值更改时，它都会生成一个中断或新组（这就是为什么通常需要使用相同的键函数对数据进行排序）。这种行为与SQL的组不同，SQL的组聚合公共元素，而不考虑它们的输入顺序
返回的组本身是一个迭代器，它与groupby（）
共享底层iterable。因为源是共享的，所以当`groupby（）对象处于高级状态时，上一个组将不再可见。因此，如果以后需要该数据，应将其存储为列表
因此，上一段中的假设是生成的列表将是空列表[]
，因为迭代器已经升级，并且满足停止迭代；但是在CPython中，结果却出人意料

这是因为该迭代器落后于原始迭代器一个项，这是因为groupby
需要查看前面的一个项以查看它是否属于当前组或下一个组，但它必须能够稍后将该项作为新组的第一个项生成
但是，groupby
的currkey
和currvalue
属性在调用时不会重置，因此currvalue
仍然指向迭代器中的最后一项
CPython文档实际上包含了这个等价的代码，它也具有与C版本代码完全相同的行为：
class groupby:
    # [k for k, g in groupby('AAAABBBCCDAABBB')] --> A B C D A B
    # [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D
    def __init__(self, iterable, key=None):
        if key is None:
            key = lambda x: x
        self.keyfunc = key
        self.it = iter(iterable)
        self.tgtkey = self.currkey = self.currvalue = object()
    def __iter__(self):
        return self
    def __next__(self):
        while self.currkey == self.tgtkey:
            self.currvalue = next(self.it)    # Exit on StopIteration
            self.currkey = self.keyfunc(self.currvalue)
        self.tgtkey = self.currkey
        return (self.currkey, self._grouper(self.tgtkey))
    def _grouper(self, tgtkey):
        while self.currkey == tgtkey:
            yield self.currvalue
            try:
                self.currvalue = next(self.it)
            except StopIteration:
                return
            self.currkey = self.keyfunc(self.currvalue)

值得注意的是，\uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu。但关键是线路
self.currvalue = next(self.it)    # Exit on StopIteration

当next
抛出StopItertion
时，self.currvalue
仍包含上一组的最后一个键。现在，当y[1]
被放入列表中时，它首先生成self.currvalue
的值，然后在底层迭代器上运行next（）
（并再次遇到StopIteration
）

尽管文档中有Python等价物，但其行为与CPython、IronPython、Jython和PyPy中的权威C代码实现完全相同，并给出不同的结果
self.currvalue = next(self.it)    # Exit on StopIteration