Python itertools groupby_Python_Itertools

Python itertools groupby

python

Python itertools groupby,python,itertools,Python,Itertools,比方说，我有下面的元组列表 [('FRG', 'MCO TPA PIE SRQ', 'WAVEY EMJAY J174 SWL CEBEE '), (' ', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4'), (' ', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4') ('FRG2', 'MCO TPA PIE SRQ', 'WAV

比方说，我有下面的元组列表

[('FRG', 'MCO TPA PIE SRQ', 'WAVEY EMJAY J174 SWL CEBEE '), 
('                    ', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4'),
('                    ', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4')
('FRG2', 'MCO TPA PIE SRQ', 'WAVEY EMJAY J174 SWL CEBEE '), 
('                    ', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4')]

我如何将这些内容分组，以便最终有一个dict，如：

{'FRG': ['MCO TPA PIE SRQ', 'WAVEY EMJAY J174 SWL CEBEE ', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4'],
 'FRG2': ...}

也就是说，我想将

元组[0]

是一个单词的部分与

元组[0]

为空（仅包含空格）的以下部分（可能有很多）粘在一起。
我在试验

itertools

中的

groupby

和

takewhile

，但还没有找到任何有效的解决方案。理想情况下，解决方案包含以下内容之一（用于学习目的，即）。

使用子类的解决方案：

l = [('FRG', 'MCO TPA PIE SRQ', 'WAVEY EMJAY J174 SWL CEBEE '),
('                    ', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4'),
('                    ', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4'),
('FRG2', 'MCO TPA PIE SRQ', 'WAVEY EMJAY J174 SWL CEBEE '),
('                    ', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4')]

d = collections.defaultdict(list)
k = ''
for t in l:
    if t[0].strip():  # if the 1st value of a tuple is not empty
        k = t[0]      # capturing dict key
    if k:
        d[k].append(t[1])
        d[k].append(t[2])

print(dict(d))

输出：

{'FRG2': ['MCO TPA PIE SRQ', 'WAVEY EMJAY J174 SWL CEBEE ', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4'], 'FRG': ['MCO TPA PIE SRQ', 'WAVEY EMJAY J174 SWL CEBEE ', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4']}

我并不推荐使用它，但要使用

itertools.groupby（）

实现这一点，您需要一个能够记住上次使用的键的键函数。大概是这样的：

def keyfunc(item, keys=[None]):
    if item[0] != keys[-1] and not item[0].startswith(" "):
        keys.append(item[0])        
    return keys[-1] 

d = {k: [y for x in g for y in x[1:]] for k, g in groupby(lst, key=keyfunc)}

一个简单的

for

循环看起来更干净，不需要任何

import

s：

d, key = {}, None
for item in lst:
    if item[0] != key and not item[0].startswith(" "):
        key = item[0]
    d.setdefault(key, []).extend(item[1:])

函数

groupby

和

takewhile

不适合此类问题

groupby

根据

键

功能分组。这意味着您需要保留最后遇到的第一个非空白元组元素以使其工作。这意味着你要保持某种全球状态。通过保持这种状态，可以说函数是纯函数，而大多数（甚至所有）itertools都是纯函数

from itertools import groupby, chain

d = [('FRG',                  'MCO TPA PIE SRQ', 'WAVEY EMJAY J174 SWL CEBEE '), 
     ('                    ', 'FMY RSW APF',     'WETRO DIW AR22 JORAY HILEY4'),
     ('                    ', 'FMY RSW APF',     'WETRO DIW AR22 JORAY HILEY4'),
     ('FRG2',                 'MCO TPA PIE SRQ', 'WAVEY EMJAY J174 SWL CEBEE '), 
     ('                    ', 'FMY RSW APF',     'WETRO DIW AR22 JORAY HILEY4')]

def keyfunc(item):
    first = item[0]
    if first.strip():
        keyfunc.state = first
    return keyfunc.state

{k: [item for idx, item in enumerate(chain.from_iterable(grp)) if idx%3 != 0] for k, grp in groupby(d, keyfunc)}

takewhile

需要向前看，以确定何时停止

生成ing值。这意味着它将自动从迭代器中弹出一个比每个组实际使用的多的值。要实际应用它，您需要记住最后一个位置，然后每次创建一个新的迭代器。它还存在一个问题，即您需要保持某种状态，因为您希望先获取一个不带空格的元素，然后获取一个只带空格的元素
一种方法可能看起来像这样（但感觉不必要的复杂）：
它们的键是否总是以FRG
开头？@dmitrypoloskiy:不，但总是以单词字符（\w+
）开头。既然值也是单词，那么我们如何知道什么是键，什么不是键呢characters@DmitryPolonskiy如问题所述，它始终是元组的第一个值。您可以使用itertools.groupby
来获取键，但就完整的解决方案而言，我必须同意@Rawing，我一直在尝试想出一个专门使用itertools
的解决方案，现在我就来blank@planetp是 啊请进一步解释为什么groupby
和takewhile不是解决问题的好方法，答案包含不同的实现。你能说得更具体一点吗？是什么让你的答案如此相似，以至于你觉得有必要投反对票？您可能还想看一看包含一些指导原则的“在适当的时候进行否决投票”。
from itertools import takewhile, islice

def takegen(inp):
    idx = 0
    length = len(inp)
    while idx < length:
        first, *rest = inp[idx]
        rest = list(rest)
        for _, *lasts in takewhile(lambda x: not x[0].strip(), islice(inp, idx+1, None)):
            rest.extend(lasts)
        idx += len(rest) // 2
        yield first, rest

dict(takegen(d))

def gen(inp):
    # Initial values
    last = None
    for first, *rest in inp:
        if last is None:       # first encountered item
            last = first
            l = list(rest)
        elif first.strip():    # when the first tuple item isn't all whitespaces
            # Yield the last "group"
            yield last, l
            # New values for the next "group"
            last = first
            l = list(rest)
        else:                  # when the first tuple item is all whitespaces
            l.extend(rest)
    # Yield the last group
    yield last, l

dict(gen(d))
# {'FRG2': ['MCO TPA PIE SRQ', 'WAVEY EMJAY J174 SWL CEBEE ', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4'], 
#  'FRG': ['MCO TPA PIE SRQ', 'WAVEY EMJAY J174 SWL CEBEE ', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4']}