Python 在列表上循环多次_Python_Iteration_Genome

Python 在列表上循环多次

python

Python 在列表上循环多次,python,iteration,genome,Python,Iteration,Genome,是否可以对列表进行多次迭代？基本上，我有一个字符串列表，我正在寻找最长的超弦。列表中的每个字符串都有至少一半长度的重叠，并且它们的大小都相同。我想看看我要添加到列表中每个序列的开始还是结束处的超弦，当我找到匹配项时，我想将该元素添加到我的超弦，从列表中删除元素，然后一次又一次地循环它，直到我的列表为空 sequences=['ATTAGACCTG','CCTGCCGGAA','AGACCTGCCG',''GCCGGAATAC'] halfway= len(sequences[0])/2 geno

是否可以对列表进行多次迭代？基本上，我有一个字符串列表，我正在寻找最长的超弦。列表中的每个字符串都有至少一半长度的重叠，并且它们的大小都相同。我想看看我要添加到列表中每个序列的开始还是结束处的超弦，当我找到匹配项时，我想将该元素添加到我的超弦，从列表中删除元素，然后一次又一次地循环它，直到我的列表为空

sequences=['ATTAGACCTG','CCTGCCGGAA','AGACCTGCCG',''GCCGGAATAC']
halfway= len(sequences[0])/2
genome=sequences[0]     # this is the string that will be added onto throughout the loop
sequences.remove(sequences[0]) 


for j in range(len(sequences)):
    for sequence in sequences:
        front=[]
        back=[]
        for i in range(halfway,len(sequence)):

            if genome.endswith(sequence[:i]):
                genome=genome+sequence[i:] 
                sequences.remove(sequence)

            elif genome.startswith(sequence[-i:]):
                genome=sequence[:i]+genome  
                sequences.remove(sequence)
'''
            elif not genome.startswith(sequence[-i:]) or not genome.endswith(sequence[:i]):

                sequences.remove(sequence)      # this doesnt seem to work want to get rid of 
                                                #sequences that are in the middle of the string and 
                                                 #already accounted for 
'''

当我不使用最后的elif语句并给出正确的答案时，这就起作用了。然而，当我使用更大的字符串列表执行此操作时，列表中仍然保留着我希望为空的字符串。如果我只是在代码中寻找要添加到超弦基因组前后的字符串，那么最后一个循环也是必需的。

试试这个：

sequences=['ATTAGACCTG','CCTGCCGGAA','AGACCTGCCG','GCCGGAATAC']
sequences.reverse()
genome = sequences.pop(-1)     # this is the string that will be added onto throughout the loop

unrelated = []

while(sequences):
    sequence = sequences.pop(-1)
    if sequence in genome: continue
    found=False
    for i in range(3,len(sequence)):
        if genome.endswith(sequence[:i]):
            genome=genome+sequence[i:]
            found = True
            break
        elif genome.startswith(sequence[-i:]):
            genome=sequence[:i]+genome  
            found = True
            break
    if not found:
        unrelated.append(sequence)

print(genome)
#ATTAGACCTGCCGGAATAC
print(sequences)
#[]
print(unrelated)
#[]

我不知道你是否保证在同一批中没有多个不相关的序列，所以我考虑了不相关的序列。如果没有必要，请随意移除

Python处理从列表前面删除的操作效率非常低，因此我将列表颠倒过来，从后面删除。根据示例数据中的完整数据，可能不需要反转

当序列可用时，我从序列列表中弹出，以避免在遍历列表时从列表中删除元素。然后我检查它是否已经在最终的基因组中。如果不是，那么我就用支票检查endswith/Begins。如果发现匹配，将其切成基因组；设置查找标志；跳出for循环

如果序列尚未包含且未找到部分匹配，则会将其放入不相关的

这就是我最终解决它的方式，我意识到您需要做的就是找出哪个字符串是超弦的开始，因为我们知道序列有1/2或更多的重叠，我发现哪一半不包含在任何序列中。从这里开始，我在一个列表上循环，循环的次数等于列表的长度，并寻找基因组的结尾与适当序列的开头相匹配的序列。当我发现这一点时，我将该序列添加到genomesuperstring中，然后删除该序列并继续遍历列表。当处理长度为1000的50个序列的列表时，此代码大约需要.806441才能运行

def moveFirstSeq(seqList): # move the first sequence in genome to the end of list 
    d={}
    for seq in seqList:
        count=0
        for seq1 in seqList:

            if seq==seq1:
                pass
            if seq[0:len(seq)/2] not in seq1:
                count+=1
                d[seq]= count

    sorted_values=sorted(d.values())
    first_sequence=''
    for k,v in d.items():
        if v==sorted_values[-1]:
            first_sequence=k
            seqList.remove(first_sequence)

            seqList.append(first_sequence)

    return seqList


seq= moveFirstSeq(sequences)  
genome = seq.pop(-1)   # added first sequence to genome and removed from list 

for j in range(len(sequences)):   # looping over the list amount of times equal to the length of the sequence list  
    for sequence in sequences:

        for i in range(len(sequence)/2,len(sequence)):

            if genome.endswith(sequence[:i]):
                genome=genome+sequence[i:]  # adding onto the superstring and 
                sequences.remove(sequence) #removing it from the sequence list 

print genome , seq

是的，这是可能的。如果您正在寻找更有用的答案，则必须编写一个更有用的问题：包括您的代码并解决特定的问题。如果您有一个循环，则始终可以添加另一个完全包含第一个循环的循环。然后外部循环可以决定内部循环应该重复多少次。我像你说的那样添加了外部循环，这样它将循环列表长度的倍，这是正确的方法吗？看起来你已经得到了。要解释完全包含的子字符串，请尝试在内部for循环的开头或结尾使用in。这将取代你们在第二次elif中尝试做的事情