如何在python中对项目运行进行分组_Python_Grouping_Itertools

如何在python中对项目运行进行分组

python

如何在python中对项目运行进行分组,python,grouping,itertools,Python,Grouping,Itertools,假设我想将间隔小于某个阈值的整数分组在一起。我的具体用例是识别测试覆盖率结果中最大的未覆盖代码块，例如： groupruns('53, 55, 57, 59, 83, 200, 205, 211, 217, 219, 306, 311, 317, 323, 325, 367, 631, 636, 645, 658, 686, 692, 787, 792, 801, 870, 875, 884, 947, 993, 1134, 1139, 1148, 1158', 3) #=> [[53,

假设我想将间隔小于某个阈值的整数分组在一起。我的具体用例是识别测试覆盖率结果中最大的未覆盖代码块，例如：

groupruns('53, 55, 57, 59, 83, 200, 205, 211, 217, 219, 306, 311, 317, 323, 325, 367, 631, 636, 645, 658, 686, 692, 787, 792, 801, 870, 875, 884, 947, 993, 1134, 1139, 1148, 1158', 3)
#=> [[53, 55, 57, 59], [83], [200], [205], [211], [217, 219], [306], [311], [317], [323, 325], [367], [631], [636], [645], [658], [686], [692], [787], [792], [801], [870], [875], [884], [947], [993], [1134], [1139], [1148], [1158]]

itertools.groupby

可以在这里帮助我们。唯一的困难是，

groupby

按需要为每个项目计算的“键”分组，这样每个组由具有相同键的连续项目组成。这意味着我们的keyfunc对象需要保存状态以执行此任务：

class runner(object):
     def __init__(self, threshold=1):
         self.threshold = threshold
         self.last = None
         self.key = None
     def __call__(self,item):
         if self.last is None:
             self.last = item
             self.key = item
             return item
         if item - self.last <= self.threshold:
             self.last = item
             return self.key
         else:
             self.last = item
             self.key = item
             return item

您可以使用Raymond Hettinger的群集功能：

def cluster(data, maxgap, key=None):
    """Arrange data into groups where successive elements
       differ by no more than *maxgap*

        >>> cluster([1, 6, 9, 100, 102, 105, 109, 134, 139], maxgap=10)
        [[1, 6, 9], [100, 102, 105, 109], [134, 139]]

        >>> cluster([1, 6, 9, 99, 100, 102, 105, 134, 139, 141], maxgap=10)
        [[1, 6, 9], [99, 100, 102, 105], [134, 139, 141]]

    http://stackoverflow.com/a/14783998/190597 (Raymond Hettinger)
    """
    data.sort()
    groups = [[data[0]]]
    for item in data[1:]:
        if key:
            val = key(item, groups[-1])
        else:
            val = abs(item - groups[-1][-1])
        if val <= maxgap:
            groups[-1].append(item)
        else:
            groups.append([item])
    return groups

data = [53, 55, 57, 59, 83, 200, 205, 211, 217, 219, 306, 311, 317, 323, 325, 367, 631, 636, 645, 658, 686, 692, 787, 792, 801, 870, 875, 884, 947, 993, 1134, 1139, 1148, 1158]
print(cluster(data, maxgap=3))

在不使用任何模块的情况下如何：

注意：假设它们已经分类了

#!/usr/bin/python

group = (53, 55, 57, 59, 83, 200, 205, 211, 217, 219, 306, 311,
         317, 323, 325, 367, 631, 636, 645, 658, 686, 692, 787,
         792, 801, 870, 875, 884, 947, 993, 1134, 1139, 1148,
         1158)

def group_runs(group, step):
    mark = [0]
    diff = map(lambda x: (x[1] - x[0]), zip(group[:],group[1:]))
    [mark.append(i+1) for i,j in enumerate(diff) if j > step]
    return [list(group[x[0]:x[1]]) for x in zip(mark[::], mark[1::])]

print group_runs(group, 3)

输出：

 [[53, 55, 57, 59], [83], [200], [205], [211], [217, 219], [306], [311], [317], [323,
   325], [367], [631], [636], [645], [658], [686], [692], [787], [792], [801], [870],
  [875], [884], [947], [993], [1134], [1139], [1148]]

这不容易读。可读性很重要。您可能希望简化并解释此代码。此外，除非有具体原因，否则不鼓励理解列表中的副作用。

#!/usr/bin/python

group = (53, 55, 57, 59, 83, 200, 205, 211, 217, 219, 306, 311,
         317, 323, 325, 367, 631, 636, 645, 658, 686, 692, 787,
         792, 801, 870, 875, 884, 947, 993, 1134, 1139, 1148,
         1158)

def group_runs(group, step):
    mark = [0]
    diff = map(lambda x: (x[1] - x[0]), zip(group[:],group[1:]))
    [mark.append(i+1) for i,j in enumerate(diff) if j > step]
    return [list(group[x[0]:x[1]]) for x in zip(mark[::], mark[1::])]

print group_runs(group, 3)

 [[53, 55, 57, 59], [83], [200], [205], [211], [217, 219], [306], [311], [317], [323,
   325], [367], [631], [636], [645], [658], [686], [692], [787], [792], [801], [870],
  [875], [884], [947], [993], [1134], [1139], [1148]]