Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/348.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
嵌套字典python中缺少信息_Python_Dictionary - Fatal编程技术网

嵌套字典python中缺少信息

嵌套字典python中缺少信息,python,dictionary,Python,Dictionary,我使用以下代码从文件创建了一个字典: Negdic={gene:{iso:exon.split(',')} for gene, iso, exon in zip(Genes, Isoforms, ExonPos)} 从这些样本列表中: Genes = ['A2M', 'A2M', 'ACADS', 'ACADVL'] Isoforms = ['NM_000014', 'NM_000016', 'NM_000017', 'NM_000018'] ExonPos = ['9220303,922

我使用以下代码从文件创建了一个字典:

Negdic={gene:{iso:exon.split(',')} for gene, iso, exon in zip(Genes, Isoforms, ExonPos)}
从这些样本列表中:

Genes = ['A2M', 'A2M', 'ACADS', 'ACADVL']

Isoforms = ['NM_000014', 'NM_000016', 'NM_000017', 'NM_000018']

ExonPos = ['9220303,9220778,9221335,9222340,9223083,9224954,9225248,9227155,9229351,9229941,9230296,9231839,9232234,9232689,9241795,9242497,9242951,9243796,9246060,9247568,9248134,9251202,9251976,9253739,9254042,9256834,9258831,9259086,9260119,9261916,9262462,9262909,9264754,9264972,9265955,9268359,', '76190031,76194085,76198328,76198537,76199212,76200475,76205664,76211490,76215103,76216135,76226806,76228376,', '121163570,121164828,121174788,121175158,121175639,121176082,121176335,121176622,121176942,121177098,', '7123149,7123440,7123782,7123922,7124084,7124242,7124856,7125270,7125495,7125985,7126451,7126962,7127131,7127286,7127464,7127639,7127798,7127960,7128127,7128275,']
然而,在翻阅字典后,我意识到带isofrom NM_000014的A2M丢失了。A2M NM_000014应与第一组ExonPos编号匹配,A2M NM_000016应与第二组ExonPos编号匹配,依此类推。我能做些什么来纠正这个问题?我之所以错过这一点,是因为我的数据集太大了,这意味着将有许多基因具有多种异构体和外显子

如果我的输出如下所示,我如何将其更改为:

dict = {'gene': {'isoform1': [exonpos], 'isoform2': [exonpos2]}, 'gene2': {isofrom1..etc}

问题是这本字典的键应该是唯一的

你的情况并非如此。因此,您将错过所有使用非唯一键的条目

在这里:


A2M
重复两次。

您有重复的按键,因此压缩时会丢失部分数据:

'A2M', 'A2M'
Python dict不能有重复的键,因此
'A2M'
的最后一个值与第二个匹配

要执行您想要的操作,您需要以下类似的操作来处理重复的键:

from collections import defaultdict

d = defaultdict(lambda : defaultdict(list))

for k, iso, exon in zip(Genes, Isoforms, ExonPos):
    d[k][iso] = exon.split(",")

print(d["A2M"])

{'NM_000014': ['9220303', '9220778', '9221335', '9222340', '9223083', '9224954', '9225248', '9227155', '9229351', '9229941', '9230296', '9231839', '9232234', '9232689', '9241795', '9242497', '9242951', '9243796', '9246060', '9247568', '9248134', '9251202', '9251976', '9253739', '9254042', '9256834', '9258831', '9259086', '9260119', '9261916', '9262462', '9262909', '9264754', '9264972', '9265955', '9268359', ''],
 'NM_000016': ['76190031', '76194085', '76198328', '76198537', '76199212', '76200475', '76205664', '76211490', '76215103', '76216135', '76226806', '76228376', '']}

defaultdict要么在第一次遇到密钥时添加该密钥,要么随后更新。显然,如果你的iso重复,你会遇到同样的行为,因此这是需要注意的

只有当原始列表中的一个比另外两个短时,才会丢失信息-
zip
将忽略溢出数据。你确定列表一开始都一样长吗?非常感谢你的帮助。“我相信这对我很有效。”christylynn002,没问题,很高兴它起了作用。
from collections import defaultdict

d = defaultdict(lambda : defaultdict(list))

for k, iso, exon in zip(Genes, Isoforms, ExonPos):
    d[k][iso] = exon.split(",")

print(d["A2M"])

{'NM_000014': ['9220303', '9220778', '9221335', '9222340', '9223083', '9224954', '9225248', '9227155', '9229351', '9229941', '9230296', '9231839', '9232234', '9232689', '9241795', '9242497', '9242951', '9243796', '9246060', '9247568', '9248134', '9251202', '9251976', '9253739', '9254042', '9256834', '9258831', '9259086', '9260119', '9261916', '9262462', '9262909', '9264754', '9264972', '9265955', '9268359', ''],
 'NM_000016': ['76190031', '76194085', '76198328', '76198537', '76199212', '76200475', '76205664', '76211490', '76215103', '76216135', '76226806', '76228376', '']}