按索引顺序为特定ID组装字符串。python
所以我有一个子列表 子列表的第一个值是ID,第二个值是索引 最终,我将尝试按照索引的顺序为每个ID组合字符串。按索引顺序为特定ID组装字符串。python,python,string,function,indexing,while-loop,Python,String,Function,Indexing,While Loop,所以我有一个子列表 子列表的第一个值是ID,第二个值是索引 最终,我将尝试按照索引的顺序为每个ID组合字符串。 raw_IDs = ['TCONS_0040771;1','TCONS_0040771;2','TCONS_0040771;3','TCONS_00040772;1','TCONS_00040772;2','TCONS_00040773;1','TCONS_00040773;2','TCONS_00040773;3','TCONS_00040773;4'] IDs = [['TCON
raw_IDs = ['TCONS_0040771;1','TCONS_0040771;2','TCONS_0040771;3','TCONS_00040772;1','TCONS_00040772;2','TCONS_00040773;1','TCONS_00040773;2','TCONS_00040773;3','TCONS_00040773;4']
IDs = [['TCONS_0040771',1],['TCONS_0040771',2],['TCONS_0040771',3],['TCONS_00040772',1],['TCONS_00040772',2],['TCONS_00040773',1],['TCONS_00040773',2],['TCONS_00040773',3],['TCONS_00040773',4]]
我有每个值的序列字典,所以
sequences = []
for k in raw_IDs:
sequences.append(D_ID_seq[k])
print sequences
sequences = ['AAA','AAB','AAAB','AAAA','BAA','BBA','BBB','CCC','DDD']
我正在尝试根据ID、TCONS\u xxx值来组装序列
desired_output = ['AAAAABAAAB','AAAABAA','BBABBBCCCDDD']
示例:ID中的前3个元素都具有相同的ID“TCONS\u 0040771”。然而,它们的指数从1到3不等。这对索引1-2的“TCONS\u 0040772”和索引1-4的“TCONS\u 0040773”重复
所需输出是根据相应ID从附加到名为“序列”列表的字典值中收集的所有字符串的组合
注***
我曾想过创建一个while循环,但有时我尝试它们时,它们会变得非常混乱,最终会运行无限长的时间
非常感谢使用以下数据提供的任何帮助:
# This assumes raw_IDs[] and sequences[] have been defined
IDs = [id.split(';') for id in raw_IDs]
prev_id = None
desired_output = []
for id in IDs.keys()
if id != prev_id:
if prev_id:
desired_output.append(output)
output = ''
output += sequences.pop(0)
if output:
desired_output.append(output)
IDs = [['TCONS_0040771', 1], ['TCONS_0040771', 2], ['TCONS_0040771', 3],
['TCONS_00040772', 1], ['TCONS_00040772', 2], ['TCONS_00040773', 1],
['TCONS_00040773', 2], ['TCONS_00040773', 3], ['TCONS_00040773', 4]]
sequences = ['AAA','AAB','AAAB','AAAA','BAA','BBA','BBB','CCC','DDD']
此代码:
last_id = IDs[0][0]
res = [sequences[0]]
for index, (id_, _) in enumerate(IDs[1:], 1):
if id_ == last_id:
res[-1] += sequences[index]
else:
res.append(sequences[index])
last_id = id_
为res
提供此选项:
['AAAAABAAAB', 'AAAABAA', 'BBABBBCCCDDD']