Python 集团'；续'；列表中的项目。itertools groupby键函数中的存储状态是否不好？_Python_Python 3.x_Group By_Grouping_Itertools

Python 集团'；续'；列表中的项目。itertools groupby键函数中的存储状态是否不好？

python python-3.x

Python 集团'；续'；列表中的项目。itertools groupby键函数中的存储状态是否不好？,python,python-3.x,group-by,grouping,itertools,Python,Python 3.x,Group By,Grouping,Itertools,我是Python新手，我正在尝试编写一个函数，将列表项分组为None信号连续项，如下所示： >>> g([1, None, 1, 1, None, None, 1]) [[1, None], [1], [1, None, None], [1]] 我的真实数据有更复杂的项目，但我已经简化了问题的核心这是我目前的解决方案： import itertools # input x = [1, None, 1, 1, None, None, 1] # desired output

我是Python新手，我正在尝试编写一个函数，将列表项分组为

None

信号连续项，如下所示：

>>> g([1, None, 1, 1, None, None, 1])
[[1, None], [1], [1, None, None], [1]]

我的真实数据有更复杂的项目，但我已经简化了问题的核心

这是我目前的解决方案：

import itertools

# input
x = [1, None, 1, 1, None, None, 1]

# desired output from g(x)
y = [[1, None], [1], [1, None, None], [1]]


def f(x):
    if x is None:
        f.lastx = x
    else:
        if x != f.lastx:
            f.counter += 1
    return f.counter


def g(x):
    f.lastx = None
    f.counter = 0
    z = [list(g) for _, g in itertools.groupby(x, f)]
    return z


assert y == g(x)

这很管用，但我知道它很难看

有没有更好的（和更像蟒蛇的）方法来做到这一点？例如，没有有状态键功能。

您可以组合

itertools.groupby

和

itertools.accumulate

：

>>> dat = [1, None, 1, 1, None, None, 1]
>>> it = iter(dat)
>>> acc = accumulate(x is not None for x in dat)
>>> [[next(it) for _ in g] for _, g in groupby(acc)]
[[1, None], [1], [1, None, None], [1]]

这是因为在每个新组开始时，

累计

将给我们提供递增的intlike值：

>>> list(accumulate(x is not None for x in dat))
[True, 1, 2, 3, 3, 3, 4]

如果希望能够处理流，只需使用迭代器。内存使用的最大增加量仅为一个组的大小

def cgroup(source):
    it, it2 = tee(iter(source), 2)
    acc = accumulate(x is not None for x in it)
    for _,g in groupby(acc):
        yield [next(it2) for _ in g]

这仍然是一个问题

>>> list(cgroup([1, None, 1, 1, None, None, 1]))
[[1, None], [1], [1, None, None], [1]]

但即使在无限源的情况下也会起作用：

>>> stream = chain.from_iterable(repeat([1, 1, None]))
>>> list(islice(cgroup(stream), 10))
[[1], [1, None], [1], [1, None], [1], [1, None], [1], [1, None], [1], [1, None]]

它并不完美，因为它需要第三方扩展（）和一些修补，但它确实产生了所需的输出：

>>> from iteration_utilities import split, is_not_None

>>> lst = [1, None, 1, 1, None, None, 1]

>>> list(split(lst, is_not_None, keep_after=True))[1:]
[[1, None], [1], [1, None, None], [1]]

使用这种方法需要丢弃第一个元素（因此

[1://code>），因为否则结果将以一个空的子列表开始。
Wow，这是一些密集的代码。：）我花了一点时间才理解它是如何工作的，但我喜欢它，并且可以看到这种方法是如何相当灵活。实际上，我只是注意到了一些。。。这种方法需要对输入数据进行两次传递。如果数据是流式传输的（这是我更大的问题），那么这种方法不起作用：（@BrianG:那么你不应该说你正在对列表项进行分组；-）但是处理流也很容易。谢谢你让我意识到这一点。我在想，某种形式的拆分可能比分组更合适。如果我想实现全功能并更多地使用这个库，这可能是有意义的，但现在我不想只为这一个函数添加依赖项。