使用python defaultdic对列表中的项目进行分组
我有一个名为“GO_文件”的列表: 我想将其转换为: A:12、13、14 B:1,5使用python defaultdic对列表中的项目进行分组,python,defaultdict,Python,Defaultdict,我有一个名为“GO_文件”的列表: 我想将其转换为: A:12、13、14 B:1,5 from collections import defaultdict GO_file = ["A_1 12", "A_2 13", "A_3 14", "A_4 12", "B_1 1", "B_2 1" "B_3 5"] GO_dict = defaultdict(list) for GO_names in GO_file: gene_id = GO_names.split("_")[0]
from collections import defaultdict
GO_file = ["A_1 12", "A_2 13", "A_3 14", "A_4 12", "B_1 1", "B_2 1" "B_3 5"]
GO_dict = defaultdict(list)
for GO_names in GO_file:
gene_id = GO_names.split("_")[0]
GO_id = GO_names.split(" ")[1:]
GO_dict[gene_id] = GO_id
print GO_dict
但是,此代码仅附加键和一个值:
defaultdict(<type 'list'>, {'A': ['12'], 'B': ['5']})
defaultdict(,{'A':['12'],'B':['5']})
我感谢你的建议 您的代码几乎没有问题
defaultdict(set)
而不是defaultdict(list)
>>> GO_dict = defaultdict(set)
>>> for GO_names in GO_file:
gene_id,_,GO_id = GO_names.partition(" ")
gene_id = gene_id.split("_")[0]
GO_dict[gene_id].add(GO_id)
>>> print GO_dict
defaultdict(<type 'set'>, {'A': set(['13', '12', '14']), 'B': set(['1', '5'])})
但是
我相信在某些情况下,itertools
解决方案比使用defaultdict
>>> from itertools import groupby
>>> from operator import itemgetter
>>> GO_file_kv = [(key.split("_")[0], value)
for key, value in (elem.split(" ") for elem in GO_file)]
>>> {key: OrderedDict.fromkeys([e for _, e in value]).keys()
for key, value in groupby(sorted(GO_file_kv, key=itemgetter(0)),
key=itemgetter(0))
}
{'A': ['12', '13', '14'], 'B': ['1', '5']}
感谢Abhijit的全面回答!
>>> GO_dict = defaultdict(OrderedDict)
>>> for GO_names in GO_file:
gene_id,_,GO_id = GO_names.partition(" ")
gene_id = gene_id.split("_")[0]
GO_dict[gene_id][GO_id] = None
>>> OrderedDict([('A', ['12', '13', '14']), ('B', ['1', '5'])])
OrderedDict([('A', ['12', '13', '14']), ('B', ['1', '5'])])
>>> from itertools import groupby
>>> from operator import itemgetter
>>> GO_file_kv = [(key.split("_")[0], value)
for key, value in (elem.split(" ") for elem in GO_file)]
>>> {key: OrderedDict.fromkeys([e for _, e in value]).keys()
for key, value in groupby(sorted(GO_file_kv, key=itemgetter(0)),
key=itemgetter(0))
}
{'A': ['12', '13', '14'], 'B': ['1', '5']}