Python 按精确顺序提取子字符串
我有一个包含三个字母的氨基酸代码和RNA序列的字符串。我想按照它在字符串中出现的确切顺序提取氨基酸代码Python 按精确顺序提取子字符串,python,string,substring,match,bioinformatics,Python,String,Substring,Match,Bioinformatics,我有一个包含三个字母的氨基酸代码和RNA序列的字符串。我想按照它在字符串中出现的确切顺序提取氨基酸代码 raw_seq = '''GGACUAGCGGAGGCUAGUCC METGLNLYSGLYASNPHEARGASNGLNARGLYSTHRVAL LYSCYSPHEASNCYSGLYLYSGLUGLYHISILEALALYS ASNCYSARGALAPROARGLYSLYSGLYCYSTRPLYSCYS GLYLYSGLUGLYHISGLNMETLYSASPCYSTHRGLUARG GLNA
raw_seq = '''GGACUAGCGGAGGCUAGUCC
METGLNLYSGLYASNPHEARGASNGLNARGLYSTHRVAL
LYSCYSPHEASNCYSGLYLYSGLUGLYHISILEALALYS
ASNCYSARGALAPROARGLYSLYSGLYCYSTRPLYSCYS
GLYLYSGLUGLYHISGLNMETLYSASPCYSTHRGLUARG
GLNALAASN'''
ascodes = ['ALA','ARG','ASN','ASP','ASX','CYS','GLU','GLN','GLX','GLY','HIS','ILE','LEU','LYS','MET','PHE','PRO','SER','THR','TRP','TYR','VAL']
for amino in ascodes:
if amino in raw_seq:
print(amino)
我的代码按字母顺序返回氨基酸序列,这破坏了它的所有生物学功能。我也试过正则表达式,但我想不出合适的模式 有点技巧,但使用
re.findall
和str.join
确保输出符合raw_seq
中的显示顺序:
import re
re.findall('|'.join(ascodes), raw_seq)
输出:
['MET',
'GLN',
'LYS',
...
'ARG',
'GLN',
'ALA',
'ASN']
您可以循环所有字符,检查当前字符和以下2个字符是否包含在氨基酸列表中
for i in range(len(raw_seq)):
amino = raw_seq[i:i+3]
if amino in ascodes:
print(amino)
结果如下:
蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸,'CYS','THR','GLU','ARG','GLN','ALA','ASN']
你的预期结果是什么?这看起来像是一个家庭作业,到目前为止你做了什么?嗨@Peet别忘了给我们一些反馈,这样我们就知道我们是否已经为你解决了问题,看到了吗
raw_seq = '''GGACUAGCGGAGGCUAGUCC
METGLNLYSGLYASNPHEARGASNGLNARGLYSTHRVAL
LYSCYSPHEASNCYSGLYLYSGLUGLYHISILEALALYS
ASNCYSARGALAPROARGLYSLYSGLYCYSTRPLYSCYS
GLYLYSGLUGLYHISGLNMETLYSASPCYSTHRGLUARG
GLNALAASN'''
ascodes = ['ALA','ARG','ASN','ASP','ASX','CYS','GLU','GLN','GLX','GLY','HIS','ILE','LEU','LYS','MET','PHE','PRO','SER','THR','TRP','TYR','VAL']
raw_seq = raw_seq.replace('\n','')
sep_set =[ raw_seq[i:i+3] for i in range(len(raw_seq)-2)]
result =[i for i in sep_set if i in ascodes]
"""
output
['MET', 'GLN', 'LYS', 'GLY', 'ASN', 'PHE', 'ARG', 'ASN', 'GLN', 'ARG', 'GLY', 'LYS', 'THR', 'VAL', 'LYS', 'CYS', 'PHE', 'ASN', 'CYS', 'GLY', 'LYS', 'GLU', 'GLY', 'HIS', 'ILE', 'ALA', 'LYS', 'ASN', 'CYS', 'ARG', 'ALA', 'PRO', 'ARG', 'GLY', 'LYS', 'LYS', 'GLY', 'CYS', 'TRP', 'LYS', 'CYS', 'GLY', 'LYS', 'GLU', 'GLY', 'HIS', 'GLN', 'MET', 'LYS', 'ASP', 'CYS', 'THR', 'GLU', 'ARG', 'GLN', 'ALA', 'ASN']
"""