如何在python中对项目运行进行分组
假设我想将间隔小于某个阈值的整数分组在一起。我的具体用例是识别测试覆盖率结果中最大的未覆盖代码块,例如:如何在python中对项目运行进行分组,python,grouping,itertools,Python,Grouping,Itertools,假设我想将间隔小于某个阈值的整数分组在一起。我的具体用例是识别测试覆盖率结果中最大的未覆盖代码块,例如: groupruns('53, 55, 57, 59, 83, 200, 205, 211, 217, 219, 306, 311, 317, 323, 325, 367, 631, 636, 645, 658, 686, 692, 787, 792, 801, 870, 875, 884, 947, 993, 1134, 1139, 1148, 1158', 3) #=> [[53,
groupruns('53, 55, 57, 59, 83, 200, 205, 211, 217, 219, 306, 311, 317, 323, 325, 367, 631, 636, 645, 658, 686, 692, 787, 792, 801, 870, 875, 884, 947, 993, 1134, 1139, 1148, 1158', 3)
#=> [[53, 55, 57, 59], [83], [200], [205], [211], [217, 219], [306], [311], [317], [323, 325], [367], [631], [636], [645], [658], [686], [692], [787], [792], [801], [870], [875], [884], [947], [993], [1134], [1139], [1148], [1158]]
itertools.groupby
可以在这里帮助我们。唯一的困难是,groupby
按需要为每个项目计算的“键”分组,这样每个组由具有相同键的连续项目组成。这意味着我们的keyfunc对象需要保存状态以执行此任务:
class runner(object):
def __init__(self, threshold=1):
self.threshold = threshold
self.last = None
self.key = None
def __call__(self,item):
if self.last is None:
self.last = item
self.key = item
return item
if item - self.last <= self.threshold:
self.last = item
return self.key
else:
self.last = item
self.key = item
return item
您可以使用Raymond Hettinger的群集功能:
def cluster(data, maxgap, key=None):
"""Arrange data into groups where successive elements
differ by no more than *maxgap*
>>> cluster([1, 6, 9, 100, 102, 105, 109, 134, 139], maxgap=10)
[[1, 6, 9], [100, 102, 105, 109], [134, 139]]
>>> cluster([1, 6, 9, 99, 100, 102, 105, 134, 139, 141], maxgap=10)
[[1, 6, 9], [99, 100, 102, 105], [134, 139, 141]]
http://stackoverflow.com/a/14783998/190597 (Raymond Hettinger)
"""
data.sort()
groups = [[data[0]]]
for item in data[1:]:
if key:
val = key(item, groups[-1])
else:
val = abs(item - groups[-1][-1])
if val <= maxgap:
groups[-1].append(item)
else:
groups.append([item])
return groups
data = [53, 55, 57, 59, 83, 200, 205, 211, 217, 219, 306, 311, 317, 323, 325, 367, 631, 636, 645, 658, 686, 692, 787, 792, 801, 870, 875, 884, 947, 993, 1134, 1139, 1148, 1158]
print(cluster(data, maxgap=3))
在不使用任何模块的情况下如何: 注意:假设它们已经分类了
#!/usr/bin/python
group = (53, 55, 57, 59, 83, 200, 205, 211, 217, 219, 306, 311,
317, 323, 325, 367, 631, 636, 645, 658, 686, 692, 787,
792, 801, 870, 875, 884, 947, 993, 1134, 1139, 1148,
1158)
def group_runs(group, step):
mark = [0]
diff = map(lambda x: (x[1] - x[0]), zip(group[:],group[1:]))
[mark.append(i+1) for i,j in enumerate(diff) if j > step]
return [list(group[x[0]:x[1]]) for x in zip(mark[::], mark[1::])]
print group_runs(group, 3)
输出:
[[53, 55, 57, 59], [83], [200], [205], [211], [217, 219], [306], [311], [317], [323,
325], [367], [631], [636], [645], [658], [686], [692], [787], [792], [801], [870],
[875], [884], [947], [993], [1134], [1139], [1148]]
这不容易读。可读性很重要。您可能希望简化并解释此代码。此外,除非有具体原因,否则不鼓励理解列表中的副作用。
#!/usr/bin/python
group = (53, 55, 57, 59, 83, 200, 205, 211, 217, 219, 306, 311,
317, 323, 325, 367, 631, 636, 645, 658, 686, 692, 787,
792, 801, 870, 875, 884, 947, 993, 1134, 1139, 1148,
1158)
def group_runs(group, step):
mark = [0]
diff = map(lambda x: (x[1] - x[0]), zip(group[:],group[1:]))
[mark.append(i+1) for i,j in enumerate(diff) if j > step]
return [list(group[x[0]:x[1]]) for x in zip(mark[::], mark[1::])]
print group_runs(group, 3)
[[53, 55, 57, 59], [83], [200], [205], [211], [217, 219], [306], [311], [317], [323,
325], [367], [631], [636], [645], [658], [686], [692], [787], [792], [801], [870],
[875], [884], [947], [993], [1134], [1139], [1148]]