Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/359.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 根据条件形成集群?_Python_Python 2.7_Python 3.x - Fatal编程技术网

Python 根据条件形成集群?

Python 根据条件形成集群?,python,python-2.7,python-3.x,Python,Python 2.7,Python 3.x,示例输入文件(实际输入文件包含大约50000个条目): 我必须将列中的每个值与相同的值(如615)进行比较必须分组在一起群集必须包含列1值(如146180…..45,49),然后群集必须断开并形成另一个群集,以获得下一组相同的值616616616…..依此类推 我写的代码是: from __future__ import division from sys import exit h = 0 historyjobs = [] targetjobs = [] def quickzh(zhlis

示例输入文件(实际输入文件包含大约50000个条目):

我必须将列中的每个值与相同的值(如615)进行比较必须分组在一起群集必须包含列1值(如146180…..45,49),然后群集必须断开并形成另一个群集,以获得下一组相同的值616616616…..依此类推

我写的代码是:

from __future__ import division
from sys import exit
h = 0
historyjobs = []
targetjobs = []


def quickzh(zhlistsub,
    targetjobs=targetjobs,num=0,denom=0):

 li = [] ; ji = []
 j = 0
 for i in zhlistsub:
    x1 = targetjobs[j][0]

    x = targetjobs[i][0]

    num += x
    denom += 1
    if x1 >= 0.9 * (num/denom):#to group all items with same value in column 0 
      li.append(targetjobs[i][1])
    else:
      break     
 return li


 def filewr(listli):
 global h
 s = open("newout1","a")
 if(len(listli) != 0):
      h += 1
      s.write("cluster: %d"%h)
      s.write("\n")
      s.write(str(listli))
      s.write("\n\n")
 else:
      print "0"


def new(inputfile,
historyjobs=historyjobs,targetjobs=targetjobs):
zhlistsub = [];zhlist = []
k = 0 

with open(inputfile,'r') as f:
    for line in f:
        job = map(int,line.split())
        targetjobs.append(job)
    while True: 
     if len(targetjobs) != 0:

       zhlistsub = [i for i, element in enumerate(targetjobs)]

       if zhlistsub:
          listrun = quickzh(zhlistsub)
          filewr(listrun)
       historyjobs.append(targetjobs.pop(0))
       k += 1
     else:
         break

new('newfinal1')
我得到的结果是:

 cluster: 1
 [146, 180, 53, 42, 52, 52, 51, 45, 49, 34, 44, 42, 41, 42, 42, 43, 42, 33, 33, 33, 33, 33, 47, 68, 449, 41, 1138, 46, 53]

 cluster: 2
 [180, 53, 42, 52, 52, 51, 45, 49, 34, 44, 42, 41, 42, 42, 43, 42, 33, 33, 33, 33, 33, 47, 68, 449, 41, 1138, 46, 53]

 cluster: 3
 [53, 42, 52, 52, 51, 45, 49, 34, 44, 42, 41, 42, 42, 43, 42, 33, 33, 33, 33, 33, 47, 68, 449, 41, 1138, 46, 53]
 ..................so on
但我需要的输出是:

  cluster: 1
  [146, 180, 53, 42, 52, 52, 51, 45, 49]
  cluster: 2
  [34, 44, 42, 41, 42]
  cluster: 3
  [42, 43, 42]
  _____________________ so on
那么,有人能建议我应该对条件进行哪些更改以获得所需的结果吗?这真的很有帮助吗?

试试这个,注意创建集群,剩下要做的就是创建列表:

import itertools as it
[[y[1] for y in x[1]] for x in it.groupby(data, key=lambda x:x[0])]
上面假设
数据
是您的输入所在的位置,并且它已经按照第一列进行了过滤和排序。问题中的示例如下所示:

data = [[615, 146], [615, 180], [615, 53] ... ]

没有测试答案,但遵循这个概念

import collections.defaultdict

cluster=defaultdict(list)

with open(inputfile,'r') as f:
    for line in f:
        clus, val = line.split()
        cluster[clus].append(val)

for clus, val in cluster:
    print "cluster" +str(clus)+"\n"
    print str(val)+"\n"

我真的很难理解你需要什么。。。但通常对于分组而言,
itertools.groupby
collections.defaultdict
是一种方法……请您建议一些条件来代替我的if条件
if x1>=0.9*(num/denom):
以提供结果。我的答案有助于构建集群,但不清楚如何使用该条件过滤值。我只能建议您将问题一分为二,首先过滤掉输入,在我的示例中作为
data
构建一个列表,然后使用上面的列表构建集群
import collections.defaultdict

cluster=defaultdict(list)

with open(inputfile,'r') as f:
    for line in f:
        clus, val = line.split()
        cluster[clus].append(val)

for clus, val in cluster:
    print "cluster" +str(clus)+"\n"
    print str(val)+"\n"