Python 根据条件形成集群？_Python_Python 2.7_Python 3.x

Python 根据条件形成集群？

python python-2.7 python-3.x

Python 根据条件形成集群？,python,python-2.7,python-3.x,Python,Python 2.7,Python 3.x,示例输入文件（实际输入文件包含大约50000个条目）：我必须将列中的每个值与相同的值（如615）进行比较必须分组在一起群集必须包含列1值（如146180…..45,49），然后群集必须断开并形成另一个群集，以获得下一组相同的值616616616…..依此类推我写的代码是： from __future__ import division from sys import exit h = 0 historyjobs = [] targetjobs = [] def quickzh(zhlis

示例输入文件（实际输入文件包含大约50000个条目）：

我必须将列中的每个值与相同的值（如615）进行比较必须分组在一起群集必须包含列1值（如146180…..45,49），然后群集必须断开并形成另一个群集，以获得下一组相同的值616616616…..依此类推

我写的代码是：

from __future__ import division
from sys import exit
h = 0
historyjobs = []
targetjobs = []


def quickzh(zhlistsub,
    targetjobs=targetjobs,num=0,denom=0):

 li = [] ; ji = []
 j = 0
 for i in zhlistsub:
    x1 = targetjobs[j][0]

    x = targetjobs[i][0]

    num += x
    denom += 1
    if x1 >= 0.9 * (num/denom):#to group all items with same value in column 0 
      li.append(targetjobs[i][1])
    else:
      break     
 return li


 def filewr(listli):
 global h
 s = open("newout1","a")
 if(len(listli) != 0):
      h += 1
      s.write("cluster: %d"%h)
      s.write("\n")
      s.write(str(listli))
      s.write("\n\n")
 else:
      print "0"


def new(inputfile,
historyjobs=historyjobs,targetjobs=targetjobs):
zhlistsub = [];zhlist = []
k = 0 

with open(inputfile,'r') as f:
    for line in f:
        job = map(int,line.split())
        targetjobs.append(job)
    while True: 
     if len(targetjobs) != 0:

       zhlistsub = [i for i, element in enumerate(targetjobs)]

       if zhlistsub:
          listrun = quickzh(zhlistsub)
          filewr(listrun)
       historyjobs.append(targetjobs.pop(0))
       k += 1
     else:
         break

new('newfinal1')

我得到的结果是：

 cluster: 1
 [146, 180, 53, 42, 52, 52, 51, 45, 49, 34, 44, 42, 41, 42, 42, 43, 42, 33, 33, 33, 33, 33, 47, 68, 449, 41, 1138, 46, 53]

 cluster: 2
 [180, 53, 42, 52, 52, 51, 45, 49, 34, 44, 42, 41, 42, 42, 43, 42, 33, 33, 33, 33, 33, 47, 68, 449, 41, 1138, 46, 53]

 cluster: 3
 [53, 42, 52, 52, 51, 45, 49, 34, 44, 42, 41, 42, 42, 43, 42, 33, 33, 33, 33, 33, 47, 68, 449, 41, 1138, 46, 53]
 ..................so on

但我需要的输出是：

  cluster: 1
  [146, 180, 53, 42, 52, 52, 51, 45, 49]
  cluster: 2
  [34, 44, 42, 41, 42]
  cluster: 3
  [42, 43, 42]
  _____________________ so on

那么，有人能建议我应该对条件进行哪些更改以获得所需的结果吗？这真的很有帮助吗？

试试这个，注意创建集群，剩下要做的就是创建列表：

import itertools as it
[[y[1] for y in x[1]] for x in it.groupby(data, key=lambda x:x[0])]

上面假设

数据

是您的输入所在的位置，并且它已经按照第一列进行了过滤和排序。问题中的示例如下所示：

data = [[615, 146], [615, 180], [615, 53] ... ]

没有测试答案，但遵循这个概念

import collections.defaultdict

cluster=defaultdict(list)

with open(inputfile,'r') as f:
    for line in f:
        clus, val = line.split()
        cluster[clus].append(val)

for clus, val in cluster:
    print "cluster" +str(clus)+"\n"
    print str(val)+"\n"

我真的很难理解你需要什么。。。但通常对于分组而言，

itertools.groupby

或

collections.defaultdict

是一种方法……请您建议一些条件来代替我的if条件

if x1>=0.9*（num/denom）：

以提供结果。我的答案有助于构建集群，但不清楚如何使用该条件过滤值。我只能建议您将问题一分为二，首先过滤掉输入，在我的示例中作为

data

构建一个列表，然后使用上面的列表构建集群

import collections.defaultdict

cluster=defaultdict(list)

with open(inputfile,'r') as f:
    for line in f:
        clus, val = line.split()
        cluster[clus].append(val)

for clus, val in cluster:
    print "cluster" +str(clus)+"\n"
    print str(val)+"\n"