Python Itertools Groupby给出了一个意外的结果_Python_Itertools

Python Itertools Groupby给出了一个意外的结果

python

Python Itertools Groupby给出了一个意外的结果,python,itertools,Python,Itertools,我有两张单子说 “K1_SS_ALM”的日期相同与“K1_SS_ALM”的日期不同我需要使用K1_SS_ALM和K1_AB_KIL（re.findall（“\w+/\w+/\d+/（.*？）\ud+\ud+.txt”，text））分组。迄今为止： finalblobfpost1=['ABC/XYZ/16082020/K1_SS_ALM_222222_14082020.txt','ABC/XYZ/16082020/K1_SS_ALM_111111_14082020.txt','ABC/XY

我有两张单子

说

“K1_SS_ALM”的日期相同

与“K1_SS_ALM”的日期不同

我需要使用K1_SS_ALM和K1_AB_KIL（re.findall（“\w+/\w+/\d+/（.*？）\ud+\ud+.txt”，text））分组。

迄今为止：

finalblobfpost1=['ABC/XYZ/16082020/K1_SS_ALM_222222_14082020.txt','ABC/XYZ/16082020/K1_SS_ALM_111111_14082020.txt','ABC/XYZ/15082020/K1_AB_KIL_444444_15082020.txt','ABC/XYZ/15082020/K1_AB_KIL_333333_15082020.txt']
keyf = lambda text: (re.findall("\w+\/\w+\/\d+\/(.*?)\_\d+_\d+.txt", text)+ [text])[0].strip()
h=[list(items) for gr, items in groupby(sorted(finalblobfpost1), key=keyf)]
print(h)

结果是-预期足够好

[['ABC/XYZ/15082020/K1_AB_KIL_333333_15082020.txt', 'ABC/XYZ/15082020/K1_AB_KIL_444444_15082020.txt'], ['ABC/XYZ/16082020/K1_SS_ALM_111111_14082020.txt',
'ABC/XYZ/16082020/K1_SS_ALM_222222_14082020.txt']]

[['ABC/XYZ/15082020/K1_AB_KIL_444444_15082020.txt'], ['ABC/XYZ/15082020/K1_SS_ALM_222222_15082020.txt'], ['ABC/XYZ/16082020/K1_AB_KIL_333333_16082020.txt'], ['ABC/XYZ/16082020/K1_SS_ALM_111111_16082020.txt']]

代码：2

finalblobfpost2=['ABC/XYZ/15082020/K1_SS_ALM_222222_15082020.txt','ABC/XYZ/16082020/K1_SS_ALM_111111_16082020.txt','ABC/XYZ/15082020/K1_AB_KIL_444444_15082020.txt','ABC/XYZ/16082020/K1_AB_KIL_333333_16082020.txt']
keyf1 = lambda text: (re.findall("\w+\/\w+\/\d+\/(.*?)\_\d+_\d+.txt", text)+ [text])[0].strip()
h1=[list(items) for gr, items in groupby(sorted(finalblobfpost2), key=keyf1)]
print(h1)

结果是：不是预期的

[['ABC/XYZ/15082020/K1_AB_KIL_333333_15082020.txt', 'ABC/XYZ/15082020/K1_AB_KIL_444444_15082020.txt'], ['ABC/XYZ/16082020/K1_SS_ALM_111111_14082020.txt',
'ABC/XYZ/16082020/K1_SS_ALM_222222_14082020.txt']]

[['ABC/XYZ/15082020/K1_AB_KIL_444444_15082020.txt'], ['ABC/XYZ/15082020/K1_SS_ALM_222222_15082020.txt'], ['ABC/XYZ/16082020/K1_AB_KIL_333333_16082020.txt'], ['ABC/XYZ/16082020/K1_SS_ALM_111111_16082020.txt']]

预期为：

[['ABC/XYZ/15082020/K1_AB_KIL_444444_15082020.txt','ABC/XYZ/16082020/K1_AB_KIL_333333_16082020.txt'],['ABC/XYZ/16082020/K1_SS_ALM_111111_16082020.txt','ABC/XYZ/15082020/K1_AB_KIL_444444_15082020.txt']]

它没有对关键字进行分组。regex有什么问题吗？或者我做错了什么

请告知。

试试这个

您的列表需要按照groupby中使用的相同键函数进行排序

试试这个：

h1=[gr的列表（项目），groupby中的项目（排序（finalblobfpost2，key=keyf1），key=keyf1）]

唯一的区别是调用sorted时的

key=keyf1

输出（与预期相同）：

这在以下文件中明确说明：

groupby（）的操作类似于Unix中的uniq筛选器。信息技术每次更改键的值时生成一个中断或新组函数更改（这就是为什么通常需要进行排序使用相同的按键功能输入数据）

它可以工作，但不是每次\w{3}都会出现我尝试\w+替代\w{3}的动态。它不起作用。你能给我一个动态的方法吗。你可以使用

\w\d+\w{2}{3，}

匹配任何len大于或等于3的单词字符。完美！！你能告诉我为什么我的代码不起作用吗？因为？只是好奇。好吧，那么为什么第一个列表工作正常？@user11646543由于

ABC/XYZ/

之后的邮戳部分，它的排序方式与按键函数排序的方式相同，碰巧…@user11646543如果您对此有疑问，请参阅我的编辑，并参考文档。。。

import re
from itertools import groupby

print(
    [list(v) for _, v in groupby(finalblobfpost1,
                                 key=lambda x: re.search("\w\d+_\w{2}_\w{3}", x).group())]
)

[['ABC/XYZ/16082020/K1_SS_ALM_222222_14082020.txt', 'ABC/XYZ/16082020/K1_SS_ALM_111111_14082020.txt'], ['ABC/XYZ/15082020/K1_AB_KIL_444444_15082020.txt', 'ABC/XYZ/15082020/K1_AB_KIL_333333_15082020.txt']]

[['ABC/XYZ/15082020/K1_AB_KIL_444444_15082020.txt', 'ABC/XYZ/16082020/K1_AB_KIL_333333_16082020.txt'], ['ABC/XYZ/15082020/K1_SS_ALM_222222_15082020.txt', 'ABC/XYZ/16082020/K1_SS_ALM_111111_16082020.txt']]