Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/284.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 按精确顺序提取子字符串_Python_String_Substring_Match_Bioinformatics - Fatal编程技术网

Python 按精确顺序提取子字符串

Python 按精确顺序提取子字符串,python,string,substring,match,bioinformatics,Python,String,Substring,Match,Bioinformatics,我有一个包含三个字母的氨基酸代码和RNA序列的字符串。我想按照它在字符串中出现的确切顺序提取氨基酸代码 raw_seq = '''GGACUAGCGGAGGCUAGUCC METGLNLYSGLYASNPHEARGASNGLNARGLYSTHRVAL LYSCYSPHEASNCYSGLYLYSGLUGLYHISILEALALYS ASNCYSARGALAPROARGLYSLYSGLYCYSTRPLYSCYS GLYLYSGLUGLYHISGLNMETLYSASPCYSTHRGLUARG GLNA

我有一个包含三个字母的氨基酸代码和RNA序列的字符串。我想按照它在字符串中出现的确切顺序提取氨基酸代码

raw_seq = '''GGACUAGCGGAGGCUAGUCC
METGLNLYSGLYASNPHEARGASNGLNARGLYSTHRVAL
LYSCYSPHEASNCYSGLYLYSGLUGLYHISILEALALYS
ASNCYSARGALAPROARGLYSLYSGLYCYSTRPLYSCYS
GLYLYSGLUGLYHISGLNMETLYSASPCYSTHRGLUARG
GLNALAASN'''
ascodes = ['ALA','ARG','ASN','ASP','ASX','CYS','GLU','GLN','GLX','GLY','HIS','ILE','LEU','LYS','MET','PHE','PRO','SER','THR','TRP','TYR','VAL']
for amino in ascodes:
    if amino in raw_seq:
        print(amino)

我的代码按字母顺序返回氨基酸序列,这破坏了它的所有生物学功能。我也试过正则表达式,但我想不出合适的模式

有点技巧,但使用
re.findall
str.join
确保输出符合
raw_seq
中的显示顺序:

import re

re.findall('|'.join(ascodes), raw_seq)
输出:

['MET',
 'GLN',
 'LYS',
 ...
 'ARG',
 'GLN',
 'ALA',
 'ASN']

您可以循环所有字符,检查当前字符和以下2个字符是否包含在氨基酸列表中

for i in range(len(raw_seq)):
    amino = raw_seq[i:i+3]
    if amino in ascodes:
        print(amino)
结果如下:

蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸蛋氨酸,'CYS','THR','GLU','ARG','GLN','ALA','ASN']


你的预期结果是什么?这看起来像是一个家庭作业,到目前为止你做了什么?嗨@Peet别忘了给我们一些反馈,这样我们就知道我们是否已经为你解决了问题,看到了吗
raw_seq = '''GGACUAGCGGAGGCUAGUCC
METGLNLYSGLYASNPHEARGASNGLNARGLYSTHRVAL
LYSCYSPHEASNCYSGLYLYSGLUGLYHISILEALALYS
ASNCYSARGALAPROARGLYSLYSGLYCYSTRPLYSCYS
GLYLYSGLUGLYHISGLNMETLYSASPCYSTHRGLUARG
GLNALAASN'''

ascodes = ['ALA','ARG','ASN','ASP','ASX','CYS','GLU','GLN','GLX','GLY','HIS','ILE','LEU','LYS','MET','PHE','PRO','SER','THR','TRP','TYR','VAL']

raw_seq = raw_seq.replace('\n','')

sep_set =[ raw_seq[i:i+3] for i in range(len(raw_seq)-2)] 

result =[i for i in sep_set if i in ascodes]
"""
output 

['MET', 'GLN', 'LYS', 'GLY', 'ASN', 'PHE', 'ARG', 'ASN', 'GLN', 'ARG', 'GLY', 'LYS', 'THR', 'VAL', 'LYS', 'CYS', 'PHE', 'ASN', 'CYS', 'GLY', 'LYS', 'GLU', 'GLY', 'HIS', 'ILE', 'ALA', 'LYS', 'ASN', 'CYS', 'ARG', 'ALA', 'PRO', 'ARG', 'GLY', 'LYS', 'LYS', 'GLY', 'CYS', 'TRP', 'LYS', 'CYS', 'GLY', 'LYS', 'GLU', 'GLY', 'HIS', 'GLN', 'MET', 'LYS', 'ASP', 'CYS', 'THR', 'GLU', 'ARG', 'GLN', 'ALA', 'ASN']

"""