在python中迭代会产生不同的操作
我有多个Phylip格式的数据集(如下指定),我想使用以下python代码将其转换为Fasta(如下指定):在python中迭代会产生不同的操作,python,Python,我有多个Phylip格式的数据集(如下指定),我想使用以下python代码将其转换为Fasta(如下指定): for j in range(1, 10): inFile = open('/path/to/input_sequence/seqfile_00' +str(j) + '.txt', 'r') outFile = open('/path/to/output_sequence/Fasta/seqfile_00' + str(j) +'.txt', 'w') inLi
for j in range(1, 10):
inFile = open('/path/to/input_sequence/seqfile_00' +str(j) + '.txt', 'r')
outFile = open('/path/to/output_sequence/Fasta/seqfile_00' + str(j) +'.txt', 'w')
inLines = inFile.readlines()
inFile.close()
outLines = inLines[1:17]
for line in outLines:
if line.startswith('\n'):
line = line.replace('\n','')
outFile.write(line.replace(' ',' \n').replace('sequence', '>sequence'))
outFile.close()
这就是我的Phylip(输入_序列)的样子:
8 1500\n
\n
sequence1 CTGTCCTTG...\n
\n
sequence2 CTGTCGTTG...\n
\n
sequence3 CTGCGTATG...\n
\n
sequence4 CTATGCCTG...\n
\n
sequence5 AGGTGTAAG...\n
\n
sequence6 AGGTGTAAG...\n
\n
sequence7 AAATTCAAA...\n
\n
sequence8 AAGTCCAAA...\n
\n
>sequence1 \n
CTGTCCTTGG...\n
>sequence2 \n
CTGTCGTTGG...\n
>sequence3 \n
CTGCGTATGG...\n
>sequence4 \n
CTATGCCTGG...\n
>sequence5 \n
AGGTGTAAGG...\n
>sequence6 \n
AGGTGTAAGA...\n
>sequence7 \n
AAATTCAAAG...\n
>sequence8 \n
AAGTCCAAAA...\n
这就是我希望我的输出_序列(Fasta格式)的样子:
8 1500\n
\n
sequence1 CTGTCCTTG...\n
\n
sequence2 CTGTCGTTG...\n
\n
sequence3 CTGCGTATG...\n
\n
sequence4 CTATGCCTG...\n
\n
sequence5 AGGTGTAAG...\n
\n
sequence6 AGGTGTAAG...\n
\n
sequence7 AAATTCAAA...\n
\n
sequence8 AAGTCCAAA...\n
\n
>sequence1 \n
CTGTCCTTGG...\n
>sequence2 \n
CTGTCGTTGG...\n
>sequence3 \n
CTGCGTATGG...\n
>sequence4 \n
CTATGCCTGG...\n
>sequence5 \n
AGGTGTAAGG...\n
>sequence6 \n
AGGTGTAAGA...\n
>sequence7 \n
AAATTCAAAG...\n
>sequence8 \n
AAGTCCAAAA...\n
当我运行上面的代码时,我得到了j=1的正确输出,但是下面的j(2:9)我得到了这个输出
\n
>sequence1 *red inverted question mark*CTGTCCTTGG...\n
>sequence2 *red inverted question mark*CTGTCGTTGG...\n
>sequence3 *red inverted question mark*CTGCGTATGG...\n
>sequence4 *red inverted question mark*CTATGCCTGG...\n
>sequence5 *red inverted question mark*AGGTGTAAGG...\n
>sequence6 *red inverted question mark*AGGTGTAAGA...\n
>sequence7 *red inverted question mark*AAATTCAAAG...\n
>sequence8 *red inverted question mark*AAGTCCAAAA...\n
(…是连续的序列,红色的倒问号是我在text wrangler中显示隐形时看到的)
我想一般性的问题,以及为什么我感到困惑,是为什么/如何代码可以在j=1时正常工作,而不是其他数字?如何解决这个问题
提前谢谢 使用和bool过滤器:
with open('filename') as f:
lines = filter(bool, map(lambda x: x.strip(), f.readlines()))
new_list = []
for values in lines:
for value in values.split(' '):
if value[0].isupper():
new_list.append(value + '\n')
else:
new_list.append('>' + value + '\n')
使用和bool过滤器:
with open('filename') as f:
lines = filter(bool, map(lambda x: x.strip(), f.readlines()))
new_list = []
for values in lines:
for value in values.split(' '):
if value[0].isupper():
new_list.append(value + '\n')
else:
new_list.append('>' + value + '\n')
如果要查找空行,请使用
If line.strip()
,如果要查找空行,请使用If line.strip()
,也可以使用glob和BioPython