Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/308.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用python defaultdic对列表中的项目进行分组_Python_Defaultdict - Fatal编程技术网

使用python defaultdic对列表中的项目进行分组

使用python defaultdic对列表中的项目进行分组,python,defaultdict,Python,Defaultdict,我有一个名为“GO_文件”的列表: 我想将其转换为: A:12、13、14 B:1,5 from collections import defaultdict GO_file = ["A_1 12", "A_2 13", "A_3 14", "A_4 12", "B_1 1", "B_2 1" "B_3 5"] GO_dict = defaultdict(list) for GO_names in GO_file: gene_id = GO_names.split("_")[0]

我有一个名为“GO_文件”的列表:

我想将其转换为:

A:12、13、14

B:1,5

from collections import defaultdict
GO_file = ["A_1 12", "A_2 13", "A_3 14", "A_4 12", "B_1 1", "B_2 1" "B_3 5"]

GO_dict = defaultdict(list)
for GO_names in GO_file:
   gene_id = GO_names.split("_")[0]
   GO_id = GO_names.split(" ")[1:]
   GO_dict[gene_id] = GO_id
print GO_dict    
但是,此代码仅附加键和一个值:

defaultdict(<type 'list'>, {'A': ['12'], 'B': ['5']})
defaultdict(,{'A':['12'],'B':['5']})

我感谢你的建议

您的代码几乎没有问题

  • 您的GO_ID中存在重复项,并且您似乎只关心unique。因此,您需要一个
    defaultdict(set)
    而不是
    defaultdict(list)
  • 生成键和值的分割算法有缺陷
  • 只需将最后一个值赋给dict,而不是附加它
  • 可能纠正的解决方案

    >>> GO_dict = defaultdict(set)
    >>> for GO_names in GO_file:
       gene_id,_,GO_id = GO_names.partition(" ")
       gene_id = gene_id.split("_")[0]
       GO_dict[gene_id].add(GO_id)
    
    
    >>> print GO_dict
    defaultdict(<type 'set'>, {'A': set(['13', '12', '14']), 'B': set(['1', '5'])})
    
    但是

    我相信在某些情况下,
    itertools
    解决方案比使用
    defaultdict

    >>> from itertools import groupby
    >>> from operator import itemgetter
    >>> GO_file_kv = [(key.split("_")[0], value) 
                       for key, value in (elem.split(" ") for elem in GO_file)]
    >>> {key: OrderedDict.fromkeys([e for _, e in value]).keys()
         for key, value in groupby(sorted(GO_file_kv, key=itemgetter(0)),
                           key=itemgetter(0))
     }
    {'A': ['12', '13', '14'], 'B': ['1', '5']} 
    

    感谢Abhijit的全面回答!
    >>> GO_dict = defaultdict(OrderedDict)
    >>> for GO_names in GO_file:
       gene_id,_,GO_id = GO_names.partition(" ")
       gene_id = gene_id.split("_")[0]
       GO_dict[gene_id][GO_id] = None
    
    
    >>> OrderedDict([('A', ['12', '13', '14']), ('B', ['1', '5'])])
    OrderedDict([('A', ['12', '13', '14']), ('B', ['1', '5'])])
    
    >>> from itertools import groupby
    >>> from operator import itemgetter
    >>> GO_file_kv = [(key.split("_")[0], value) 
                       for key, value in (elem.split(" ") for elem in GO_file)]
    >>> {key: OrderedDict.fromkeys([e for _, e in value]).keys()
         for key, value in groupby(sorted(GO_file_kv, key=itemgetter(0)),
                           key=itemgetter(0))
     }
    {'A': ['12', '13', '14'], 'B': ['1', '5']}