根据python中的数字范围对值进行分组
我的名单如下:根据python中的数字范围对值进行分组,python,Python,我的名单如下: [(220921998, 2426), (220921999, 2427), (220922000, 2428), (220922001, 2429), (220922563, 2991), (220922564, 2992), (220922565, 2993), (220922566, 2994), (220922575, 3003), (220923958, 4386), (220924161, 4589), (220924170, 4598), (220924171, 4
[(220921998, 2426),
(220921999, 2427),
(220922000, 2428),
(220922001, 2429),
(220922563, 2991),
(220922564, 2992),
(220922565, 2993),
(220922566, 2994),
(220922575, 3003),
(220923958, 4386),
(220924161, 4589),
(220924170, 4598),
(220924171, 4599),
(220924172, 4600),
(220924173, 4601),
(220924912, 5340),
(220926340, 6768),
(220926341, 6769),
(220926342, 6770),
(220926343, 6771),
(220926344, 6772),
(220927052, 7480),
(220927053, 7481),
(220927054, 7482),
(220927055, 7483),
(220927056, 7484),
(220927069, 7497),
(220927071, 7499)]
我想根据第二个数字向列表中添加一个字符串。如果列表中的第二个数字与其他第二个数字的距离在20左右,则将为它们指定相同的“项目”名称。见下文:
[(220921998, 2426,project1),
(220921999, 2427,project1),
(220922000, 2428,project1),
(220922001, 2429,project1),
(220922563, 2991,project2),
(220922564, 2992,project2),
(220922565, 2993,project2),
(220922566, 2994,project2),
(220922575, 3003,project3),
(220923958, 4386,project4),
(220924161, 4589,project5),
(220924170, 4598,project5),
(220924171, 4599,project5),
(220924172, 4600,project5),
(220924173, 4601,project5),
(220924912, 5340,project6),
(220926340, 6768,project7),
(220926341, 6769,project7),
(220926342, 6770,project7),
(220926343, 6771,project7),
(220926344, 6772,project7),
(220927052, 7480,project8),
(220927053, 7481,project8),
(220927054, 7482,project8),
(220927055, 7483,project8),
(220927056, 7484,project8),
(220927069, 7497,project8),
(220927071, 7499,project8)]
我试过groupby,但找不到一个适合范围的方法。任何帮助都会很好。谢谢您使用一个按键功能,记住最后一项并与当前项一起检查
lst = [(220921998, 2426),
(220921999, 2427),
(220922000, 2428),
(220922001, 2429),
(220922563, 2991),
(220922564, 2992),
(220922565, 2993),
(220922566, 2994),
(220922575, 3003),
(220923958, 4386),
(220924161, 4589),
....]
class Delta:
def __init__(self, delta):
self.last = None
self.delta = delta
self.key = 1
def __call__(self, value):
if self.last is not None and abs(self.last - value[1]) > self.delta:
# Compare with the last value (`self.last`)
# If difference is larger than 20, advance to next project
self.key += 1
self.last = value[1] # Remeber the last value.
return self.key
import itertools
for key, grp in itertools.groupby(lst, key=Delta(20)):
for tup in grp:
print(tup + ('project{}'.format(key),))
如果使用Python 3.x,则可以使用以下函数(请参阅):
尝试循环浏览您的数据:
prev = 0
currentProject = 1;
newx = []
for t[1] in x:
if t - prev <= 20:
pass
else:
currentProject += 1
newx.append((t[0],t[1],"project"+currentProject))
prev=0
当前项目=1;
newx=[]
对于x中的t[1]:
如果t-prev使用
下面的简单解决方案怎么样:
data = [(220921998, 2426),
(220921999, 2427),
(220922000, 2428),
(220922001, 2429),
...
(220922563, 2991),
(220922564, 2992)]
ref = 0
cnt = 0
out = []
for dt in data:
if dt[1]-ref > 20:
cnt += 1
ref = dt[1]
out.append((dt[0],dt[1],'project%d'%cnt))
@falsetru——这不难解决<代码>x=国际热核实验堆(x);上一个=下一个(x);newx=[prev+('project1',)]
…哇,这太棒了。多谢各位。但是,我不理解Delta类是如何工作的。@microbeatic,\uuuu调用\uuuu
如果调用了类实例,则调用该类。请参阅@falsetru+1尼斯方法\这绝对是迄今为止最简单的方法。多谢各位。所有的答案都是惊人的。现在,我不知道该接受哪一个:)哇,我不知道这个模块。它实际上创建了单独的集群。我将来需要这个。非常感谢。
x=[(220921998, 2426), (220921999, 2427), .... (220927071, 7499)]
start=0
flag=False
num=0
res=[]
for n,t in enumerate(x):
#if not flag:start=x[n][1]
if (x[n][1]-start)<20:
res.append(t+('project%s' %num,))
flag=True
else:
flag=False
start=x[n][1]
num+=1
res.append(t+('project%s' %num,))
print res
[(220921998, 2426, 'project1'),
(220921999, 2427, 'project1'),
(220922000, 2428, 'project1'),
(220922001, 2429, 'project1'),
(220922563, 2991, 'project2'),
(220922564, 2992, 'project2'),
(220922565, 2993, 'project2'),
(220922566, 2994, 'project2'),
(220922575, 3003, 'project2'),
(220923958, 4386, 'project3'),
(220924161, 4589, 'project4'),
(220924170, 4598, 'project4'),
(220924171, 4599, 'project4'),
(220924172, 4600, 'project4'),
(220924173, 4601, 'project4'),
(220924912, 5340, 'project5'),
(220926340, 6768, 'project6'),
(220926341, 6769, 'project6'),
(220926342, 6770, 'project6'),
(220926343, 6771, 'project6'),
(220926344, 6772, 'project6'),
(220927052, 7480, 'project7'),
(220927053, 7481, 'project7'),
(220927054, 7482, 'project7'),
(220927055, 7483, 'project7'),
(220927056, 7484, 'project7'),
(220927069, 7497, 'project7'),
(220927071, 7499, 'project7')]
>>> import cluster
>>> cl = cluster.HierarchicalClustering(data, lambda x,y: abs(x[1]-y[1]))
>>> cl.getlevel(20)
[
[(220926340, 6768), (220926341, 6769), (220926344, 6772), (220926342, 6770),
(220926343, 6771)],
[(220927052, 7480), (220927053, 7481), (220927056, 7484),
(220927054, 7482), (220927055, 7483), (220927069, 7497), (220927071, 7499)],
[(220921998, 2426), (220921999, 2427), (220922000, 2428), (220922001, 2429)],
[(220922575, 3003), (220922563, 2991), (220922564, 2992), (220922565, 2993),
(220922566, 2994)],
[(220924912, 5340)],
[(220923958, 4386)],
[(220924161, 4589), (220924170, 4598), (220924171, 4599), (220924172, 4600),
(220924173, 4601)]
]
data = [(220921998, 2426),
(220921999, 2427),
(220922000, 2428),
(220922001, 2429),
...
(220922563, 2991),
(220922564, 2992)]
ref = 0
cnt = 0
out = []
for dt in data:
if dt[1]-ref > 20:
cnt += 1
ref = dt[1]
out.append((dt[0],dt[1],'project%d'%cnt))