Python 在已经剥离行之后删除空白?
我在删除Fasta文件中的所有空格时遇到问题,以下是我目前使用的程序:Python 在已经剥离行之后删除空白?,python,bioinformatics,Python,Bioinformatics,我在删除Fasta文件中的所有空格时遇到问题,以下是我目前使用的程序: import re for line in f: line = line.rstrip(' \n\r') if line.startswith(">"): seqid = re.search('Segment:[(0-9)]',line).group() seqID.append(seqid) else:
import re
for line in f:
line = line.rstrip(' \n\r')
if line.startswith(">"):
seqid = re.search('Segment:[(0-9)]',line).group()
seqID.append(seqid)
else:
numSeq = len(line)
这就是测试文件的样子(我只使用了前两个来显示seqId):
当我把它打印出来时,它是这样打印出来的:
ATTATATTCAGTATGGAAAGAATAAAAGAACTACGGAATCTGATGTCGCAGTCTCGCACTCGCGAGATAC 70
TGACAAAAACCACAGTGGACCATATGGCCATAATTAAGAAGTACACATCGGGGAGACAGGAAAAGAACCC 70
GTCACTTAGGATGAAATGGATGATGGCAATGAAATATCCAATCACTGCTGACAAAAGGGTAACAGAAATG 70
0
ATTATATTCAGTATGGAAAGAATAAAAGAATTACGGAATCTGATGTCGCAATCTCGCACTCGCGAGATAC 70
TGACAAAAACCACAGTGGACCATATGGCCATAATTAAGAAGTACACATCGGGGAGACAGGAAAAGAACCC 70
GTCACTTAGGATGAAATGGATGATGGCAATGAAATACCCAATCACTGCTGACAAAAGAATAACAGAAATG 70
0
ATTATATTCAGTATGGAAAGAATAAAAGAACTACGGAATCTGATGTCGCAGTCTCGCACTCGCGAGATAC 70
TGACAAAAACCACAGTGGACCATATGGCCATAATTAAGAAGTACACATCGGGGAGACAGGAAAAGAACCC 70
GTCACTTAGGATGAAATGGATGATGGCAATGAAATATCCAATCACTGCTGACAAAAGGGTAACAGAAATG 70
0
我如何让它连接线和删除线与0核苷酸?抱歉,由于睡眠不足,措辞不当。如果你对我的问题有任何疑问,请随时提问
以下是完整的程序:
from __future__ import division
import re
f = open('fastatest.fasta','r')
numGC = 0;
allGC = []; #array that contains all the GC%'s
sequences = []; #The array that contains all the sequences
seqID = []; #The array that contains all seqIds
seqLen = [];
numSeq = 0
GCPercent = 0
#Concatinating the FASTA file
for line in f:
line = line.rstrip(' \n\r')
if line.startswith(">"):
seqid = re.search('Segment:[(0-9)]',line).group()
seqID.append(seqid)
else: #Find the Length and GC%
numSeq = len(line)
#print seqid, numSeq
GCPercent = (( line.count('G') + line.count('C') ) / (numSeq)*100)
allGC.append(GCPercent);
sequences.append(line)
seqLen.append(numSeq)
print "%s\t%d\t%.2f" % (seqid,numSeq,GCPercent)
以及我收到的输出:
Segment:1 70 40.00
Segment:1 70 44.29
Segment:1 70 38.57
Traceback (most recent call last):
File "blah", line 20, in <module>
GCPercent = (( line.count('G') + line.count('C') ) / (numSeq)*100)
ZeroDivisionError: division by zero
段:17040.00
部分:17044.29
部分:17038.57
回溯(最近一次呼叫最后一次):
文件“blah”,第20行,在
GCPercent=((行计数('G')+行计数('C'))/(numSeq)*100)
ZeroDivision错误:被零除
条件附加是否有效
if not seqid.strip.startswith('0'):
seqID.append(seqid)
如果没有,则可以查看
seqid
的外观 条件附加是否有效
if not seqid.strip.startswith('0'):
seqID.append(seqid)
如果没有,则可以查看
seqid
的外观 条件附加是否有效
if not seqid.strip.startswith('0'):
seqID.append(seqid)
如果没有,则可以查看
seqid
的外观 条件附加是否有效
if not seqid.strip.startswith('0'):
seqID.append(seqid)
如果没有,则可以查看
seqid
的外观 当直线长度为0时,可以直接跳到循环的下一个迭代:
numSeq = len(line) # from your code for reference
if not numSeq:
continue
当直线长度为0时,可以直接跳到循环的下一个迭代:
numSeq = len(line) # from your code for reference
if not numSeq:
continue
当直线长度为0时,可以直接跳到循环的下一个迭代:
numSeq = len(line) # from your code for reference
if not numSeq:
continue
当直线长度为0时,可以直接跳到循环的下一个迭代:
numSeq = len(line) # from your code for reference
if not numSeq:
continue
如果文件在每个序列后都有一个空行(也是在最后一个序列后!),那么应该可以这样做:
if line.startswith(">"):
seqid = re.search('Segment:[(0-9)]',line).group()
seqID.append(seqid)
sequence = ""
elif len(line.strip()):
sequence += line.strip() # three lines will make a sequence
else: #Find the Length and GC%
numSeq = len(sequence)
#print seqid, numSeq
GCPercent = (( sequence.count('G') + sequence.count('C') ) / (numSeq)*100)
allGC.append(GCPercent);
sequences.append(sequence)
seqLen.append(numSeq)
print "%s\t%d\t%.2f" % (seqid,numSeq,GCPercent)
我刚刚添加了三行,并在四个位置用序列替换了
line
。看起来是一个最小变化的解决方案,但我还没有测试过 如果文件在每个序列之后都有一个空行(也是在最后一个序列之后!),那么应该可以:
if line.startswith(">"):
seqid = re.search('Segment:[(0-9)]',line).group()
seqID.append(seqid)
sequence = ""
elif len(line.strip()):
sequence += line.strip() # three lines will make a sequence
else: #Find the Length and GC%
numSeq = len(sequence)
#print seqid, numSeq
GCPercent = (( sequence.count('G') + sequence.count('C') ) / (numSeq)*100)
allGC.append(GCPercent);
sequences.append(sequence)
seqLen.append(numSeq)
print "%s\t%d\t%.2f" % (seqid,numSeq,GCPercent)
我刚刚添加了三行,并在四个位置用序列替换了
line
。看起来是一个最小变化的解决方案,但我还没有测试过 如果文件在每个序列之后都有一个空行(也是在最后一个序列之后!),那么应该可以:
if line.startswith(">"):
seqid = re.search('Segment:[(0-9)]',line).group()
seqID.append(seqid)
sequence = ""
elif len(line.strip()):
sequence += line.strip() # three lines will make a sequence
else: #Find the Length and GC%
numSeq = len(sequence)
#print seqid, numSeq
GCPercent = (( sequence.count('G') + sequence.count('C') ) / (numSeq)*100)
allGC.append(GCPercent);
sequences.append(sequence)
seqLen.append(numSeq)
print "%s\t%d\t%.2f" % (seqid,numSeq,GCPercent)
我刚刚添加了三行,并在四个位置用序列替换了
line
。看起来是一个最小变化的解决方案,但我还没有测试过 如果文件在每个序列之后都有一个空行(也是在最后一个序列之后!),那么应该可以:
if line.startswith(">"):
seqid = re.search('Segment:[(0-9)]',line).group()
seqID.append(seqid)
sequence = ""
elif len(line.strip()):
sequence += line.strip() # three lines will make a sequence
else: #Find the Length and GC%
numSeq = len(sequence)
#print seqid, numSeq
GCPercent = (( sequence.count('G') + sequence.count('C') ) / (numSeq)*100)
allGC.append(GCPercent);
sequences.append(sequence)
seqLen.append(numSeq)
print "%s\t%d\t%.2f" % (seqid,numSeq,GCPercent)
我刚刚添加了三行,并在四个位置用序列替换了
line
。看起来是一个最小变化的解决方案,但我还没有测试过 您可以通过检查空行来忽略空行:
from __future__ import division
import re
numGC = 0;
allGC = []; #array that contains all the GC%'s
sequences = []; #The array that contains all the sequences
seqID = []; #The array that contains all seqIds
seqLen = [];
numSeq = 0
GCPercent = 0
with open('fastatest.fasta', 'r') as f:
#Concatinating the FASTA file
for line in f:
line = line.rstrip(' \n\r')
if line: # non-empty line?
if line.startswith(">"):
seqid = re.search('Segment:[(0-9)]',line).group()
seqID.append(seqid)
else: #Find the Length and GC%
numSeq = len(line)
#print seqid, numSeq
GCPercent = ((line.count('G') + line.count('C')) /
(numSeq)*100)
allGC.append(GCPercent);
sequences.append(line)
seqLen.append(numSeq)
print "%s\t%d\t%.2f" % (seqid,numSeq,GCPercent)
输出:
段:17040.00
部分:17044.29
部分:17038.57
部分:17037.14
部分:17044.29
部分:17037.14
您可以通过检查空行来忽略空行:
from __future__ import division
import re
numGC = 0;
allGC = []; #array that contains all the GC%'s
sequences = []; #The array that contains all the sequences
seqID = []; #The array that contains all seqIds
seqLen = [];
numSeq = 0
GCPercent = 0
with open('fastatest.fasta', 'r') as f:
#Concatinating the FASTA file
for line in f:
line = line.rstrip(' \n\r')
if line: # non-empty line?
if line.startswith(">"):
seqid = re.search('Segment:[(0-9)]',line).group()
seqID.append(seqid)
else: #Find the Length and GC%
numSeq = len(line)
#print seqid, numSeq
GCPercent = ((line.count('G') + line.count('C')) /
(numSeq)*100)
allGC.append(GCPercent);
sequences.append(line)
seqLen.append(numSeq)
print "%s\t%d\t%.2f" % (seqid,numSeq,GCPercent)
输出:
段:17040.00
部分:17044.29
部分:17038.57
部分:17037.14
部分:17044.29
部分:17037.14
您可以通过检查空行来忽略空行:
from __future__ import division
import re
numGC = 0;
allGC = []; #array that contains all the GC%'s
sequences = []; #The array that contains all the sequences
seqID = []; #The array that contains all seqIds
seqLen = [];
numSeq = 0
GCPercent = 0
with open('fastatest.fasta', 'r') as f:
#Concatinating the FASTA file
for line in f:
line = line.rstrip(' \n\r')
if line: # non-empty line?
if line.startswith(">"):
seqid = re.search('Segment:[(0-9)]',line).group()
seqID.append(seqid)
else: #Find the Length and GC%
numSeq = len(line)
#print seqid, numSeq
GCPercent = ((line.count('G') + line.count('C')) /
(numSeq)*100)
allGC.append(GCPercent);
sequences.append(line)
seqLen.append(numSeq)
print "%s\t%d\t%.2f" % (seqid,numSeq,GCPercent)
输出:
段:17040.00
部分:17044.29
部分:17038.57
部分:17037.14
部分:17044.29
部分:17037.14
您可以通过检查空行来忽略空行:
from __future__ import division
import re
numGC = 0;
allGC = []; #array that contains all the GC%'s
sequences = []; #The array that contains all the sequences
seqID = []; #The array that contains all seqIds
seqLen = [];
numSeq = 0
GCPercent = 0
with open('fastatest.fasta', 'r') as f:
#Concatinating the FASTA file
for line in f:
line = line.rstrip(' \n\r')
if line: # non-empty line?
if line.startswith(">"):
seqid = re.search('Segment:[(0-9)]',line).group()
seqID.append(seqid)
else: #Find the Length and GC%
numSeq = len(line)
#print seqid, numSeq
GCPercent = ((line.count('G') + line.count('C')) /
(numSeq)*100)
allGC.append(GCPercent);
sequences.append(line)
seqLen.append(numSeq)
print "%s\t%d\t%.2f" % (seqid,numSeq,GCPercent)
输出:
段:17040.00
部分:17044.29
部分:17038.57
部分:17037.14
部分:17044.29
部分:17037.14
尝试使用Biopython
from Bio import SeqIO
for record in SeqIO.parse("fasta.fas","fasta"):
print record.id
print record.seq
这将删除所有新线等。尝试使用Biopython
from Bio import SeqIO
for record in SeqIO.parse("fasta.fas","fasta"):
print record.id
print record.seq
这将删除所有新线等。尝试使用Biopython
from Bio import SeqIO
for record in SeqIO.parse("fasta.fas","fasta"):
print record.id
print record.seq
这将删除所有新线等。尝试使用Biopython
from Bio import SeqIO
for record in SeqIO.parse("fasta.fas","fasta"):
print record.id
print record.seq
这将删除所有新行等。输入文件的示例将非常有用。您能提供输入文件的示例吗?我用示例输入文件运行示例代码,您得到的输出与您列出的相去甚远。这是您提供的完整代码段吗?不,这不是完整的代码,我只是想弄清楚如何删除空行。输入文件的示例将非常有用。您能否提供输入文件的外观示例?我使用示例输入文件运行示例代码,您得到的输出与您列出的相去甚远。这是您提供的完整代码段吗?不,这不是完整的代码,我只是想弄清楚如何删除空行。输入文件的示例将非常有用。您能否提供输入文件的外观示例?我使用示例输入文件运行示例代码,您得到的输出与您列出的相去甚远。这是您提供的完整代码段吗?不,这不是完整的代码,我只是想弄清楚如何删除空行。输入文件的示例将非常有用。您能否提供输入文件的外观示例?我使用示例输入文件运行示例代码,您得到的输出与您列出的相去甚远。这是您提供的完整代码段吗?不,这不是完整的代码,我只是想找出如何删除空行。这就解决了零除问题。你会不会碰巧知道如何使它将片段包裹在一起,而不是在70个核苷酸后停止?我想你可以使用
'.join(sequences)
来解决零除问题。你会不会碰巧知道如何使它将片段包裹在一起,而不是在70个核苷酸后停止?我想你可以使用'.join(sequences)
来解决零除问题。你会不会碰巧知道如何使它将片段包裹在一起,而不是在70个核苷酸后停止?我想你可以使用'.join(sequences)
来解决这个问题