Python 如何防止通过for循环覆盖字典中的数据

Python 如何防止通过for循环覆盖字典中的数据,python,python-2.7,sorting,for-loop,dictionary,Python,Python 2.7,Sorting,For Loop,Dictionary,我有一段数据,我必须在整理之前对其进行分析 数据(样本): 3.30.67.10[2i69A',1sfkA',1sfkB',1sfkH',2hcnA',2hcsA',2hfzA',2of6A',2qeqA',2qeqB',2wa1A',2wa1B',2wa2A',2wa2B',4r05A',4r8rA',4r8sA',1PKWA 2m0sA',4uifA',4o6bA',2vbcA',3gczA 1ztxE',3C5CC',4ctjA',3u1iA',3u1iB',',3u1jA',3U1RA'

我有一段数据,我必须在整理之前对其进行分析

数据(样本):

3.30.67.10[2i69A',1sfkA',1sfkB',1sfkH',2hcnA',2hcsA',2hfzA',2of6A',2qeqA',2qeqB',2wa1A',2wa1B',2wa2A',2wa2B',4r05A',4r8rA',4r8sA',1PKWA 2m0sA',4uifA',4o6bA',2vbcA',3gczA 1ztxE',3C5CC',4ctjA',3u1iA',3u1iB',',3u1jA',3U1RA',3u1jB',3R8SA',3j42A',3V6GA',4LKT0A',4LKT0A',4LKA',4LK8A',4L‘3j8dB’、‘1p58A’、‘2m9pA’、‘2m9qA’、‘3uzvA’、‘1uzgA’、‘3p8zA’、‘3uzeD’、‘3vttA’、‘2BRA’、‘2MFA’、‘2OMA’、‘2OMB’、‘4m9fA’、‘4m9iA’、‘4m9kA’、‘2jlrA’、‘2jlsA’、‘2jluA’、‘2jlvA’、‘2jlwA’、‘2jlxA’、‘2jlxA’、‘2jlyA’、‘2whxA’、‘2whxC’、‘2wzqA’、‘2YZQA’、‘2wzqC’、‘3UYZOA’、‘1K6UYZ6A’、1A、“3ixyA”、“2hg0A”、“2v6iA”、“2v6jA”、“4r8tA”、“1yksA”、“1ymfA”、“3evaA”、“3evbA”、“3evdA”、“3eveA”、“3evfA”、“1p58D”、“2b6a”、“1s6nA”、“1l9kA”、“1ama”、“1ana”、“1ok8A”、“1kea”、“1r6rA”、“1thdA”、“3j8dG”、“3j8dH”、“3zkoA”、“4pla”、“4uifB”、“4ut6A”、“4utcA”、“4utcA”、“4uta”、“4uta”、“4uta”、“4utcA”、“3g05a”、“3g4uta”、“4uta”cbfA、4cbfB、3j35A、1tc7C、2fp7A、2fp7B、2g05D、2ggvA、2ggvB、2ijoA、2ijoB、2p5pA、2yolA、3e90A、3e90B、3e90C、4r8tB、4c2iA、2oxtA、3egpA、3ircA、4ffyA、4ffzA、4l5fE、2jqmA、2jv6A、4am0A、4am0Q、4am0R、4fg0A、4bz1A、4BZ3QA、2X3QA、2BMA、5XYA、3I0A、3I0A、3I0A、3Y0A、,‘3lkzA’、‘4o6cA’、‘4o6dA’、‘4oieA’、‘4oiiA’、‘3c6dA’、‘2r6pA’、‘2p3lA’、‘2p3oA’、‘2p41A’、‘1RZA’、‘2pxaA’、‘2pxcA’、‘2v8oA’、‘2wv9A’、‘3c6dD’、‘3j42D’、‘2z83A’、‘3p54A’、‘4MDGA’、‘4MDCA’、‘4mtpA’、‘4mtpD’、‘2jsfA’、‘692RA’、‘3C6IXA’、‘3iyaD’、‘3IXA’、‘3IXA’等‘3evgA’、‘1df9A’、‘2qidA’、‘3j27A’、‘3j27B’、‘3j2pA’、‘3j2pB’、‘4uihA’、‘3uzqB’、‘1EFA’、‘1tg8A’、‘1GEA’、‘3ixxD’、‘5a1zA’、‘1n6gA’、‘1na4A’、‘1svbA’、‘4azxA’、‘4azxD’、‘4b03A’、‘4C2B2IB’、‘4cctA’、‘4CTD’、‘2h0pA’、‘3uajB’、‘3uc0A’、‘3we1A’、‘2j7uA’、‘3J7UA’、‘3J6BU’、‘3VW6A’等“4HJA”、“4v0qA”、“4v0rA”]

正如你所见,一些数据的前4位类似于“1sfk”。如果它们共享前4位,这意味着它们属于相同的结构,我需要将每个完整蛋白质代码(5位,如1sfkA或1sfkB)(在PDBSum数据库中找到)的唯一UniProt代码一起存储在该4位代码下

为此,我创造了这种代码的和平:

for domain in dDomainSeqSum.keys():# CHANGE TO COMPRESS FILE
        dDomainSeqSumSWS[domain]={}
        for pdb in dDomainSeqSum[domain]:#add sws of a pdb in a variable and later add that variable to the domain thing
            pdb1 = list(pdb)#split is not working
            pdb2 = pdb1[0]+pdb1[1]+pdb1[2]+pdb1[3]
            dDomainSeqSumSWS[domain][pdb2]=[]
            for i in range(len(PDBSum)): #make pdb3 search and then compare to the pdb stored
                if pdb in PDBSum[i]:
                    if "SWS_ID" in PDBSum[i]:
                        line = PDBSum[i].split()
                        if pdb2 not in dDomainSeqSumSWS:
                            dDomainSeqSumSWS[domain][pdb2]=[line[2]]
                        else:
                            dDomainSeqSumSWS[domain][pdb2].append(line[2])
运行这两个代码后,我得到的结果如下:

3.3.30.67.10岁以下的人::{3.3.30.30.30.7.10’:{{3.30.30.30.67.10’:{{3.30.30.30.67.10’:{{3.30.30.30.67.10’:{[3.3'3.3.3岁以下以下以下以下以下以下以下以下的人:,,“3333333333333333333L L LLL3L:[[[[Q9WWWWWZ5'[[Q9LZ5'5'],,“Q9Q999WWZ5.3.3.3.3.3.3.3.3.3.3.30.30.30.30.30.30.30.30.67.30.67.67.67.67.67.67.67.67.10'],,,,,,“33.:['P29991'],'3evd':['P03314'],'3eve':['P03314'],'2p1d':['P12823'],'3j42':“2jly’:['QQQQYYF5'],“QQQQQQQQYY5'”,“QQQQQQQQQYYY5'],,“QQQQQQQQQQYY5'],,“2jlx':['QQQQQQQQQYFFFFFFFFY5',,“2jlx'QQQQQQQQQQQQYYYYYYYY5'5',,,,,“QQQQQQQQQQQQQQQQQQQQQYYYYYYYYYYYYYY5他们他们他们他们他们他们他们他们他们他们他们他们5'5'5',,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,“2JL他们他们他们他们他们他们他们他们他们他们他们他们他们他们:['P27395'],'2hcn':['P14335'],'2oxt':['A0EKU1'],'1tg8':['P27914'],'4hdg':['P27395']“4am0':['Q58HT7'],“4am0':['Q58HT7'],“4am0':['Q58HT7'],“4ut6':['Q68Y26 26'],'4ut9','4ut9','4ut9','4ut9'4ut9','4ut9'4ut9'4ut9':::[[[[[[[Q68Y68YY26 26 26 26 26 26 26'],,'4ut9'4ut9',,'4ut9'4ut9'4ut9',[[[[[[[[[Q68Y9'4ut9 9他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们]、‘4HJ’:[‘Q6DLV0’]、‘4mtp’:[‘P27395’]、‘3j8d’:[‘P12823’]、‘3uc0’:[‘P09866’]“4m9k':['Q91H74'],“4m9k':['Q91H74'],“4m9k':['Q91H74'],“4M99I':['Q91H74'],“4m9k':['Q91H74'],“4L5L5L5L5L5555L55555555555F F F F F F'::::[[[[[[[[[[[[[[[[[[[Q9191Q91Q91H74 Q91H74 74'''']],“Q91Q91H74'],“QQQQ9174'],,“4999999QQQQQQQQQ9174'],,“4999999K':[[[[[[[[[[[[[[[[[[[[Q9174''''['P12823']、'1svb':['P14336']、'4r8t':['O90417']、'2hfz':['P14335']、'2v6j':['Q32ZD5']“3V6I':['Q322ZZ555ZZZ5'],“3u1j':['Q222ZZ55ZZ5'],“3U1U1J':['Q5UB51'],“3U1Zko运营运营运营运营运营运营运营运营公司”:3ZZZZZZZZZZZZZZZZZZK::[[[[[[[[[[[[[P128282813'P12823''3',”,”3ZZZZZZZZZZZZZZZZZZK运营运营运营运营运营运营运营运营运营运营公司',[[[[[[[[[[[3ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZKO3],“3ZZZZ‘1bef’:[‘Q9Q4T1’]、‘3evc’:[‘P03314’]、‘3j05’:[‘Q689G3’]、‘3egp’:[‘Q9J7C6’]“4r05':['C1KBQ3'],“4r05':['C1KQQQ3'],“1N66G':['C1KQQQ3'],“1N66G':['P14336'],“1N6666N6G':[[[[[[[[[10'P069696935',“2VVVV8585'::[[[[[[[[[[[[10'P057575757575769'69'69'],,,“4r05'4r05'4r05'4r05',“4r05'4r05'5R05':[[[[[[[[[[[[[[[[C1QQQ11111111111111KQQQQQ33333333333333333333333333333J6S':['Q6DLV0'],'3j6u':['Q6DLV0'],'1sfk':['P14335'],'1z66':['P29837']“3j35':['77FLK7'],“3j35':['777FLKK7'],'3j35',“3j35':['777FLK7'],“4K6666666M':['777FLKKK7'],“4KJ6UAJ'3uaj J',“3uaj J J J J',,“3JJJJ3J35':['77777777777KKKKKKKKKK7777777777KKKKKK7':[[[[[[[[[[[[[[[[[[777777FLKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK7']],,,,,,,,,,,“他们他们他们他们他们V0']、'4tpl':['Q5SBG8']、'1yks':['P03314']、'4bz1':['Q7TGC7']、'4bz2':['Q7TGC7']“1thd’:['11HD’、'11HD’:[[[P12823'3'],“11THD’、'1thd’、'11THD’,'11THD’、'11THD’:[[[[P12823'3'],“11THD’:[[[P1282823',“20'20世纪0 0 0'Q93QQ3333333333333硬盘’:[[[[[[3'111HD’:[[[[[10'11T硬盘硬盘硬盘硬盘硬盘硬盘’:[[[[[[[10'111111T硬盘硬盘硬盘硬盘硬盘硬盘硬盘硬盘’:[[[[[[[[[[[[[[[[[[[[10'1QQQQQQQQQQQQQQQQQQQQQQQ3 3 3 3 3'3'3'3'],,4cct':['G3F5K5'],'2r29':['P29991'],'2p40':['Q9WLZ5'],'1na4':['P14336']“1r6r':['P12823'],“1r6r R':['P12823'],“3355X':['P12823'],“3555X':['Q6H1H1E5'],“11YM M M M M F',“111YM M M M M M M M M M M M基金::[[[[[[[[['P033333314'4'],,,[[[[[[[[[[[[11YM'1你们们,”1YM他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们他们的身份:::[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[‘3c6d’:[‘Q3BCY5’]、‘3c6e’:[‘Q3BCY5’]、‘4o6c’:[‘Q9Q6P4’]、‘4o6b’:[‘P29990’],'4o6d':['Q9Q6P4'],'2ijo':['P06935'],'2wa2':['Q8QL64'],'1tc7':['P06935'],'3j27':['P14340'],'2wa1':['Q8QL64'],'3gcz':['Q7T918']
dDomainSeqSumSWS[domain][pdb2]=[]
for domain in dDomainSeqSum.keys():# CHANGE TO COMPRESS FILE
    dDomainSeqSumSWS[domain]={}
    for pdb in dDomainSeqSum[domain]:#add sws of a pdb in a variable and later add that variable to the domain thing
        pdb2 = pdb[:4] #you do not need to convert to list for indexing and you can slice the first four characters off.
        dDomainSeqSumSWS[domain][pdb2]=[]
        for i in range(len(PDBSum)): #make pdb3 search and then compare to the pdb stored
            if pdb in PDBSum[i]:
                if "SWS_ID" in PDBSum[i]:
                    line = PDBSum[i].split()
                    dDomainSeqSumSWS[domain].setdefault(pdb2,[]).append(line[2])