Python 序列中的计数模式
我在文本文件中有如下顺序Python 序列中的计数模式,python,Python,我在文本文件中有如下顺序 import re infile=open("seq.fasta",'r') out=open("results.csv",'w') pattern=re.compile(r"(P[A-Z]{2}P)") for line in infile: line = line.strip("\n") if line.startswith('>'): name=line else: s = re.findall(p
import re
infile=open("seq.fasta",'r')
out=open("results.csv",'w')
pattern=re.compile(r"(P[A-Z]{2}P)")
for line in infile:
line = line.strip("\n")
if line.startswith('>'):
name=line
else:
s = re.findall(pattern,line)
print '%s:%s' %(name,s)
out.write('%s:\t%s\n' %(name,s))
>
P1
MPPRRSIVEVKVLDVQKRRRVPNKYVYVIIRVTWSGATEAYRRYSKFFDLQMLDKFP MEGGQKDPKQRIIPFLpgILFRSHIRDVAVKRIPIDEYCPYISQCDEVFETRPELDLNPLNPLNPLNPLNPLNPLNPLVVQVQVVADQQQESSEISLZWWWWVSTAEQWVPATCLEGQDGVQDFEFGFLQPEEEKYPTIQQIDLQEDEKWKKKYKKKKKWKKKWKKWKKKKYGPLNPLNPLNPLNPLNPLNPLNPLNPLNPLNPLNPLNPLNPLNPLNPLNPLNPLNPLNPDGVSRHQNAMGREKELLNNQRDGRFEGRLVPDVKQRSPKMRQRPPPRRDMTIPRGLNL
>
P2
MaevrkftkrkPgTaaelVgLekKlveplDyenvTdLdLmFpMedicinesvigrrTvTvTvPedekRaqLvKecTvHvNyKye DFSGFRMlPcKsLpKgKgKgKgKgKgKgKgKgKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKfKENIMASLERSMHPELMKYGRETEQLNKLSRGDGRQNLFSFDSEVQRLDFSGIEPDVKPFE ekcnkrfmvnchdltfnillghdnakgpptinveffinlalfdvknnckyisdfdln ppsvremlwgtstqlsndgnakgfspeshlighgaesqlcyikgiftnppeifvr
>
P3
GDDSEWLPVDQKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK根据VVKKKVGKDTNVMLVALAAKCLTGLAVGLKKKKGQYAG HVVPTILEKKKKKKQVQQQQAIDAIFLTTTLQNISEDVLDNKNPTIKQQQTSLFI ARSFRCTSSTLPKLKPFCAALLKINDSAPEVRDAAFEGTALLKVQGEKSVNPFLA
在这些序列中总共有100个序列,我使用python脚本搜索了一个感兴趣的模式,如下所示
import re
infile=open("seq.fasta",'r')
out=open("results.csv",'w')
pattern=re.compile(r"(P[A-Z]{2}P)")
for line in infile:
line = line.strip("\n")
if line.startswith('>'):
name=line
else:
s = re.findall(pattern,line)
print '%s:%s' %(name,s)
out.write('%s:\t%s\n' %(name,s))
这个脚本运行得很好,它给了我想要的模式…现在我想计算脚本中出现的每个序列中感兴趣的模式,如下所示
import re
infile=open("seq.fasta",'r')
out=open("results.csv",'w')
pattern=re.compile(r"(P[A-Z]{2}P)")
for line in infile:
line = line.strip("\n")
if line.startswith('>'):
name=line
else:
s = re.findall(pattern,line)
print '%s:%s' %(name,s)
out.write('%s:\t%s\n' %(name,s))
>
p1:PGCP
>
p1:PHCP,PKCP。诸如此类
但我想发布如下>
p1:1
>
p1:2
有谁能告诉我如何使用python实现这一点吗
findall
方法返回匹配字符串的列表。因此,您可以在代码中使用len(s)
,而不是s
out.write('%s:\t%s\n' %(name,len(s)))
findall
方法返回匹配字符串的列表。因此,您可以在代码中使用len(s)
,而不是s
out.write('%s:\t%s\n' %(name,len(s)))