在python中迭代会产生不同的操作

在python中迭代会产生不同的操作,python,Python,我有多个Phylip格式的数据集(如下指定),我想使用以下python代码将其转换为Fasta(如下指定): for j in range(1, 10): inFile = open('/path/to/input_sequence/seqfile_00' +str(j) + '.txt', 'r') outFile = open('/path/to/output_sequence/Fasta/seqfile_00' + str(j) +'.txt', 'w') inLi

我有多个Phylip格式的数据集(如下指定),我想使用以下python代码将其转换为Fasta(如下指定):

for j in range(1, 10):
    inFile = open('/path/to/input_sequence/seqfile_00' +str(j) + '.txt', 'r')
    outFile = open('/path/to/output_sequence/Fasta/seqfile_00' + str(j) +'.txt', 'w')
    inLines = inFile.readlines()
    inFile.close()
    outLines = inLines[1:17]
    for line in outLines:
        if line.startswith('\n'):
            line = line.replace('\n','')
        outFile.write(line.replace('  ',' \n').replace('sequence', '>sequence'))
outFile.close()
这就是我的Phylip(输入_序列)的样子:

8 1500\n
\n
sequence1  CTGTCCTTG...\n
\n
sequence2  CTGTCGTTG...\n
\n
sequence3  CTGCGTATG...\n
\n
sequence4  CTATGCCTG...\n
\n
sequence5  AGGTGTAAG...\n
\n
sequence6  AGGTGTAAG...\n
\n
sequence7  AAATTCAAA...\n
\n
sequence8  AAGTCCAAA...\n
\n
>sequence1 \n
CTGTCCTTGG...\n
>sequence2 \n
CTGTCGTTGG...\n
>sequence3 \n
CTGCGTATGG...\n
>sequence4 \n
CTATGCCTGG...\n
>sequence5 \n
AGGTGTAAGG...\n
>sequence6 \n
AGGTGTAAGA...\n
>sequence7 \n
AAATTCAAAG...\n
>sequence8 \n
AAGTCCAAAA...\n
这就是我希望我的输出_序列(Fasta格式)的样子:

8 1500\n
\n
sequence1  CTGTCCTTG...\n
\n
sequence2  CTGTCGTTG...\n
\n
sequence3  CTGCGTATG...\n
\n
sequence4  CTATGCCTG...\n
\n
sequence5  AGGTGTAAG...\n
\n
sequence6  AGGTGTAAG...\n
\n
sequence7  AAATTCAAA...\n
\n
sequence8  AAGTCCAAA...\n
\n
>sequence1 \n
CTGTCCTTGG...\n
>sequence2 \n
CTGTCGTTGG...\n
>sequence3 \n
CTGCGTATGG...\n
>sequence4 \n
CTATGCCTGG...\n
>sequence5 \n
AGGTGTAAGG...\n
>sequence6 \n
AGGTGTAAGA...\n
>sequence7 \n
AAATTCAAAG...\n
>sequence8 \n
AAGTCCAAAA...\n
当我运行上面的代码时,我得到了j=1的正确输出,但是下面的j(2:9)我得到了这个输出

\n
>sequence1 *red inverted question mark*CTGTCCTTGG...\n
>sequence2 *red inverted question mark*CTGTCGTTGG...\n
>sequence3 *red inverted question mark*CTGCGTATGG...\n
>sequence4 *red inverted question mark*CTATGCCTGG...\n
>sequence5 *red inverted question mark*AGGTGTAAGG...\n
>sequence6 *red inverted question mark*AGGTGTAAGA...\n
>sequence7 *red inverted question mark*AAATTCAAAG...\n
>sequence8 *red inverted question mark*AAGTCCAAAA...\n
(…是连续的序列,红色的倒问号是我在text wrangler中显示隐形时看到的)

我想一般性的问题,以及为什么我感到困惑,是为什么/如何代码可以在j=1时正常工作,而不是其他数字?如何解决这个问题

提前谢谢

使用和bool过滤器:

with open('filename') as f:
    lines = filter(bool, map(lambda x: x.strip(), f.readlines()))

new_list = []

for values in lines:
    for value in values.split(' '):
        if value[0].isupper():
            new_list.append(value + '\n')
        else:
            new_list.append('>' + value + '\n')
使用和bool过滤器:

with open('filename') as f:
    lines = filter(bool, map(lambda x: x.strip(), f.readlines()))

new_list = []

for values in lines:
    for value in values.split(' '):
        if value[0].isupper():
            new_list.append(value + '\n')
        else:
            new_list.append('>' + value + '\n')

如果要查找空行,请使用
If line.strip()
,如果要查找空行,请使用
If line.strip()
,也可以使用glob和BioPython