Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/318.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/string/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
按'组织字符串列表&燃气轮机;名称';和';序列';。python_Python_String_List_While Loop_Append - Fatal编程技术网

按'组织字符串列表&燃气轮机;名称';和';序列';。python

按'组织字符串列表&燃气轮机;名称';和';序列';。python,python,string,list,while-loop,append,Python,String,List,While Loop,Append,所以我有一个从文件读取的字符串列表(查询)。我想将所有以“>”开头的行附加到名为name_list的列表中,并将其后面(但在下一行“>”之前)的所有字母字符附加到列表中。这与我之前关于集合论的问题非常相似,但当我试图操纵while循环时,它陷入了一个无限反馈循环中 下面是字符串列表的一个示例 query = [">mm10_refGene_NM_001011532 range=chr2:86084810-86085854 5'pad=0 3'pad=0 strand=- repeatMas

所以我有一个从文件读取的字符串列表(查询)。我想将所有以“>”开头的行附加到名为name_list的列表中,并将其后面(但在下一行“>”之前)的所有字母字符附加到列表中。这与我之前关于集合论的问题非常相似,但当我试图操纵while循环时,它陷入了一个无限反馈循环中

下面是字符串列表的一个示例

query = [">mm10_refGene_NM_001011532 range=chr2:86084810-86085854 5'pad=0 3'pad=0 strand=- repeatMasking=none", 'caatgcctttgcctcactgataatttctattagtcttatcttatttcatt', 'ttactttgcagctgttaagacttgatgaaATGGCTGGAAGCAATGCCACT', 'GGTGTGACAGAATTCATTCTCTTGGGGTTTGCAGTCCAGAGAGAGGTAGA',">mm10_refGene_NM_001011534 range=chr2:85352995-85353924 5'pad=0 3'pad=0 strand=- repeatMasking=none", 'ATGGAACAAAGTAATGACACCAAAGTGACTGAATTCATTCTTCTGGGATT', 'TTCCGGACAGCACAAATCTTGGCACATTCTGTTCATAATATTTCTAATGA', 'TCTATGTTGTCACACTCATGGGTAACATTGGAATGATCGTACTCATCAAA']
这就是我一直在使用的代码:

name_list = []
seq_list = []

for line in query:

    while line.startswith(">"):
        name=line
        temp_seq=[]

        for line in query:
            if line.isalpha()==True:
                temp_seq.append(line)


            else:
                break
        name_list.append(name)
        seq_list.append(''.join(temp_seq))
输出数据的示例:

name_list = [">mm10_refGene_NM_001011532 range=chr2:86084810-86085854 5'pad=0 3'pad=0 strand=- repeatMasking=none",">mm10_refGene_NM_001011534 range=chr2:85352995-85353924 5'pad=0 3'pad=0 strand=- repeatMasking=none"]

seq_list = ['caatgcctttgcctcactgataatttctattagtcttatcttatttcattttactttgcagctgttaagacttgatgaaATGGCTGGAAGCAATGCCACTGGTGTGACAGAATTCATTCTCTTGGGGTTTGCAGTCCAGAGAGAGGTAGA','ATGGAACAAAGTAATGACACCAAAGTGACTGAATTCATTCTTCTGGGATTTTCCGGACAGCACAAATCTTGGCACATTCTGTTCATAATATTTCTAATGATCTATGTTGTCACACTCATGGGTAACATTGGAATGATCGTACTCATCAAA']

很抱歉,如果这与()类似,并且在任何方面都是多余的,但我认为这将是一个很好的问题,可以帮助处理此类数据的人员

下面是对代码的修改,它一次遍历一个查询元素:

name_list = []
seq_list = []

lines = iter(query)
for line in lines:
    while line.startswith(">"):
        name = line
        temp_seq = []
        for line in lines:
            if line.isalpha():
                temp_seq.append(line)
            else:
                break
        name_list.append(name)
        seq_list.append(''.join(temp_seq))
name_list = []
seq_list = []

seq = ""
for line in query:
    if line.startswith('>'):
        if seq:
            seq_list.append(seq)
            seq = ""
        name_list.append(line)
    elif line.isalpha():
        seq = seq + line
seq_list.append(seq)
但是,在您提供的示例中,
query
具有一致的“name”模式,后跟3个“sequences”。如果您的数据总是遵循这种一致的模式,那么这里有另一种方法。您可以定义一个名为
grouper
()的函数,它允许您一次读取
query
的4个元素

from itertools import izip_longest

def grouper(n, iterable, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)
现在,对于
query
的每4个元素“chunk”,将第一个元素附加到“names”中,并将最后3个元素连接到“sequences”中:

输出:

[">mm10_refGene_NM_001011532 range=chr2:86084810-86085854 5'pad=0 3'pad=0 strand=- repeatMasking=none", ">mm10_refGene_NM_001011534 range=chr2:85352995-85353924 5'pad=0 3'pad=0 strand=- repeatMasking=none"]
['caatgcctttgcctcactgataatttctattagtcttatcttatttcattttactttgcagctgttaagacttgatgaaATGGCTGGAAGCAATGCCACTGGTGTGACAGAATTCATTCTCTTGGGGTTTGCAGTCCAGAGAGAGGTAGA', 'ATGGAACAAAGTAATGACACCAAAGTGACTGAATTCATTCTTCTGGGATTTTCCGGACAGCACAAATCTTGGCACATTCTGTTCATAATATTTCTAATGATCTATGTTGTCACACTCATGGGTAACATTGGAATGATCGTACTCATCAAA']

您可以通过以下方式非常直接地执行此操作:

这将产生一个元组列表,如
[('>header','caatgcttt…'),…]
。如果您真的想将它们分开,您可以重新压缩列表:

names, seqs = zip(*name_seq_chunks(query))

这看起来像是您解析了一个fasta文件以获得这两个序列。我知道你问的问题超出了它的范围,但你有没有调查过
from itertools import groupby

def name_seq_chunks(seq): 
    isheader = lambda l:l.startswith('>')
    header = None
    for startgroup, dataiter in groupby(seq, isheader):
        if startgroup is True:
            header = list(dataiter)[-1]
        elif startgroup is False:
            yield header, ''.join(dataiter)

print list(name_seq_chunks(query))
names, seqs = zip(*name_seq_chunks(query))