Python 根据条件形成集群?
示例输入文件(实际输入文件包含大约50000个条目): 我必须将列中的每个值与相同的值(如615)进行比较必须分组在一起群集必须包含列1值(如146180…..45,49),然后群集必须断开并形成另一个群集,以获得下一组相同的值616616616…..依此类推 我写的代码是:Python 根据条件形成集群?,python,python-2.7,python-3.x,Python,Python 2.7,Python 3.x,示例输入文件(实际输入文件包含大约50000个条目): 我必须将列中的每个值与相同的值(如615)进行比较必须分组在一起群集必须包含列1值(如146180…..45,49),然后群集必须断开并形成另一个群集,以获得下一组相同的值616616616…..依此类推 我写的代码是: from __future__ import division from sys import exit h = 0 historyjobs = [] targetjobs = [] def quickzh(zhlis
from __future__ import division
from sys import exit
h = 0
historyjobs = []
targetjobs = []
def quickzh(zhlistsub,
targetjobs=targetjobs,num=0,denom=0):
li = [] ; ji = []
j = 0
for i in zhlistsub:
x1 = targetjobs[j][0]
x = targetjobs[i][0]
num += x
denom += 1
if x1 >= 0.9 * (num/denom):#to group all items with same value in column 0
li.append(targetjobs[i][1])
else:
break
return li
def filewr(listli):
global h
s = open("newout1","a")
if(len(listli) != 0):
h += 1
s.write("cluster: %d"%h)
s.write("\n")
s.write(str(listli))
s.write("\n\n")
else:
print "0"
def new(inputfile,
historyjobs=historyjobs,targetjobs=targetjobs):
zhlistsub = [];zhlist = []
k = 0
with open(inputfile,'r') as f:
for line in f:
job = map(int,line.split())
targetjobs.append(job)
while True:
if len(targetjobs) != 0:
zhlistsub = [i for i, element in enumerate(targetjobs)]
if zhlistsub:
listrun = quickzh(zhlistsub)
filewr(listrun)
historyjobs.append(targetjobs.pop(0))
k += 1
else:
break
new('newfinal1')
我得到的结果是:
cluster: 1
[146, 180, 53, 42, 52, 52, 51, 45, 49, 34, 44, 42, 41, 42, 42, 43, 42, 33, 33, 33, 33, 33, 47, 68, 449, 41, 1138, 46, 53]
cluster: 2
[180, 53, 42, 52, 52, 51, 45, 49, 34, 44, 42, 41, 42, 42, 43, 42, 33, 33, 33, 33, 33, 47, 68, 449, 41, 1138, 46, 53]
cluster: 3
[53, 42, 52, 52, 51, 45, 49, 34, 44, 42, 41, 42, 42, 43, 42, 33, 33, 33, 33, 33, 47, 68, 449, 41, 1138, 46, 53]
..................so on
但我需要的输出是:
cluster: 1
[146, 180, 53, 42, 52, 52, 51, 45, 49]
cluster: 2
[34, 44, 42, 41, 42]
cluster: 3
[42, 43, 42]
_____________________ so on
那么,有人能建议我应该对条件进行哪些更改以获得所需的结果吗?这真的很有帮助吗?试试这个,注意创建集群,剩下要做的就是创建列表:
import itertools as it
[[y[1] for y in x[1]] for x in it.groupby(data, key=lambda x:x[0])]
上面假设数据
是您的输入所在的位置,并且它已经按照第一列进行了过滤和排序。问题中的示例如下所示:
data = [[615, 146], [615, 180], [615, 53] ... ]
没有测试答案,但遵循这个概念
import collections.defaultdict
cluster=defaultdict(list)
with open(inputfile,'r') as f:
for line in f:
clus, val = line.split()
cluster[clus].append(val)
for clus, val in cluster:
print "cluster" +str(clus)+"\n"
print str(val)+"\n"
我真的很难理解你需要什么。。。但通常对于分组而言,
itertools.groupby
或collections.defaultdict
是一种方法……请您建议一些条件来代替我的if条件if x1>=0.9*(num/denom):
以提供结果。我的答案有助于构建集群,但不清楚如何使用该条件过滤值。我只能建议您将问题一分为二,首先过滤掉输入,在我的示例中作为data
构建一个列表,然后使用上面的列表构建集群
import collections.defaultdict
cluster=defaultdict(list)
with open(inputfile,'r') as f:
for line in f:
clus, val = line.split()
cluster[clus].append(val)
for clus, val in cluster:
print "cluster" +str(clus)+"\n"
print str(val)+"\n"