Python 正确查找字符串序列时如何重置计数器
所以我在做cs50 dna的问题,我很难使用计数器,因为我不知道如何编码来正确计算我要寻找的序列在两个序列之间没有另一个序列的情况下重复的最高次数。例如,我正在寻找序列AAT,文本是aatdhdtksdhataat,因此最大数量应该是两个,因为最后两个序列是AAT,它们之间没有序列 这是我的密码:Python 正确查找字符串序列时如何重置计数器,python,Python,所以我在做cs50 dna的问题,我很难使用计数器,因为我不知道如何编码来正确计算我要寻找的序列在两个序列之间没有另一个序列的情况下重复的最高次数。例如,我正在寻找序列AAT,文本是aatdhdtksdhataat,因此最大数量应该是两个,因为最后两个序列是AAT,它们之间没有序列 这是我的密码: text="TCTAGTCTAGTCTAGTCTAGTCTAGACTTGTCGCTGACTCCGAGAAGATCCTAACATTAACCAATTCCCCCTAGTCTGAGGCACGGTTA
text="TCTAGTCTAGTCTAGTCTAGTCTAGACTTGTCGCTGACTCCGAGAAGATCCTAACATTAACCAATTCCCCCTAGTCTGAGGCACGGTTACCGATCGGGTTAATGGATCTCTCACCGTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAAACGTGTAACTGTAATAATCCGCCCGAAAAAACTGATCTTAGGGTTGCGGCATCTGCACGTGACAGTGTGCTACTGTTAGATAGAGGGATCAAACGAGGTTGCAAGGATTATATCTCTCCGTGCTCGATAAGACACAGCCGGTTGCGGGCTGCTTCCTCTGGATCCAATGCAGCCGTACGTACACCGTAGAGCAAATTTAGTGGTAAAGGAACTTGCTCAAACACTACGGCTTCGGGCTACTGTTGGCGCCGGTTGGGGATCCCATTCAACGCTGGCCCTTTCGCTATGGTTCGGTGATTTTACACCGAAGCGAACCTTGAACCGTGGATTTCGGGTGTCCTCCGTTTTTAGGTACTGCGTGCAGACATGGGCACCTGCCATAGTGCGATCAGCCAGAATCCATTGTATGGGAGTTGGACTCGTTTGAATTTACCGGAAACCTCATGCTTGGTCTGTAGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGAAACTGGGCGACTTGAAGTCGGCTTGCGTATTAATAGCTCTGCAATGTAACTCGGCCCTTGGCGGCGGGCAGCTTAGTATTGAACCGCGACACACCATAGGTGCGGCAAATATTAAAAGTACGCTCGAACCGGAACCTGTCTCCATGACTGGACGACCAGCCCGGCGTCTTCTACGTAACACAGGGGGCTGTCGAGGTAGGGCGTAGGAACTTCGGGGTCACTACGCCGTAACAGCACCGAATATCATATCATCCAACTTGCTTGGTACATGCCCCGTTCTGTATCAAAAGTTTACGGCCCCGGACATACCTGCTGTCAGTTGAATACCTATGCGAGTCTGAAACACGAATAGTTCAGGCGTGCAAAGACACGCTAAGCACACGCCGCAGGCAGGGGGGGTATTTTATAAGTCGTTTTTTGGAAGGGTAATGTAAACTTATCCCATAATACCCTTTGGCTTCCCCTCACTCGTGCACTTCTCATAATGATACGTCAGGGTGATTGTAGATTCACGCGTCATCAGATTGTCCCTTTCTCGAGTCTTAGTATCTTTCCTAATCCGCTCGACTCTGCGCCATGATCGAATTCCTGACAGGCTACAAGAATAAACTGCCAGCATACTCCTTACCGATTGGCGCCTACTAATTATACGCACATGGGCATCTTCGACGTCTAAACATAGGCTCTTAGTATTCCGTAGGATGTTGAGCCGACAGGAAAGTCAAACGTCGTGGGTGACCGTAGCCTGACTCGCCCGACGCAGGATTCGCTCATATGTGTGAACGGATGCTTATGTAACTTCCTAATTGCAGCGAATGGCAGTTCCGTAGTGAAGGTTCGAAACGTACGGGGTCCGGCCATGGATTAGATCTTTCAGTGCGCTAAACTCTTAACCGCAGATACTTGGCGGACCATCTTCGTGTTGCTACTATGGTATAGACCAGGCTGTCGAATCTACTTAACACAGGTGAACCCCCAGATCGGCTAGAGCCTTCGAGGCTAGACCTTTAACAATCTTTAGACACTTCCAAATCGCGGCCGGATATGTCTCGTTGGCAGCCGCAGACAAGAGAAGAGGGTCGGCAGTGTCTGCCACGCGTGACCTGTATGATCTTAGCCTTTAAGATCACACTACTGATCACAATCTATTATGATTGCCTTAGCTAACTGAGTGATGCACCCCCACAGGCTGAGAGAAATCTGTAGTTTGACGACACGCCGTCTGGCTAAAAATGTGAATCCGCCGATCCGAGACGGTGGAAGCTTGAGACCAAATGCGGGAAACCAATGACTTCATTACGGAACAAGACATAACGGCGTGAGTTGACGACTGGGATTAACCCTTTCCCGAGTCTGTACTTCTGCTACACAATGAGGATGCGAATTATCTAAGACCTTGTACTACCTAAACTAACCCTGAGGCGGGCATTGAATTCCGGCCATCTTCAGCCCAAAGAAAGACCAAATGTGAGGAAAATGAGGGATCGGTATAAGCTTTTCACGATCTCAAGGTTCACGGCCGCCAGGGCCGTAGTTGGGGCTTCATGCACATTGCCAACCCGGACATCGACAGTCGGTACCGCAGGGGTTCGAGGAATACTCCCAGCTGTGACACCTGGTCGTCGACTGGACCCAGCTGGTGGGCGGCATAGGTAGTTAATACTGAATTAAAGCCGGGAACGTCTCTCTAACTAGAAACCTTGTGATAGGATACACAGACCTAGTGCCCCGACGTTAGCATTTGAATTCATCTATCTTGGCGTCTTTTAGTAGGCCTGGGTCAACTCCGGCGTTGGCCAAAATAACCGATCTGCGTTATGTGGCCACGCATCGAGTGACAGGGTGCATACAAATTGATGGTCAAAGAGTTTAAACAAGACAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCCCCACGCTTCTACATAGCCACACTGGAGCTAGTCCTCGTGTTAAATTTTTCGCTTGTTGCACGGTTATCATCAGAAGTGCCACTGGTATTCCTCTGTAGCTCCCGTATGCCGAAGGTTGCGGCTTAGGTACTGCTTATACACGTCTCTCAAGTTTGTCAGCCGCGTGATCTTTCTGCGGGGATAGGTGATCGTCCCTCGCTCCGGACATTGCATTAAAATTACCTAGTTGATAGGGCGGCGGAGTTGCATACCGGCGTTCAATCGCGGCTCCAGACTGGTTTGAGCTACGCGTCTGCCAGCGTGAAAAAGCTGATTTGTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCCAGGTATTATCATTTGAATCGTATGTTTTCTGCCGTACGTCAACTGCGTCGTCGGGGACTGAAATGGTCTGCCTCCAGACCCTTACCTCCCGATAAGCCATGACTAAGTATGTGAAGGATCACCTGAATTGCTGAAAGTTAACGGTAAGATATCTGAAAGAGCTCATTAGATCCAACACTTATCTACTCAAAAATTCGTCATATTTCGGTGACTTGCTAGAAAGGCTCTTGCACAGTAAGGTTATAGAGAATGCTACCGTTGAAGCACCAGCCGTTGAAGCCCGCCTTTAACCACGCGATATATCCAATTAACCAAGGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGTCGCCTTGTAATAATTACTTTGGCCCGGATTATAACGAAGGAACTCGCCATGAACTCGCAGCACGTTGTACTGGAACAATCTACTTTTTATAATATAGCGATAACTCCCAGCTTTTATGTGGGTGATATTGTCCTAGCTTTTTAAAGATACCCTCTGGCCCGGTCCAAGTAAGGTCCACATTGCCTGACGTAAGCGTACGGTCAACGGGTGCACCGGTTCCCGCTAAAGCTCGATCCTATTCTTTCAGTCGGGGGGAAATAAACTCGTATACTCTCCACCCACCCGTACGTCCCGGACTAGAATAACTACCGGGTATTTCCGGTTCGTAACACCACGCCATGACGTGTCAACATAAACGCTTCTTTTGAAAGGTGCACATGCAGATTGCACAAGCAGCAGGCACCGCCCTTATCCATATCCTGTTGAGGCCCTCGATCCTAGTGTTCCTTGTTATCAGGATATTTTCTCGCTGTACGTTATTGTCCTTTTCAAATTACAACTGACCGCTTCCTCACCCGCTAAACCCTACCTTACGCACAACCAAGGCCTTGTCCCGGATGAACCCGGCTGCTCCTATGGATAAGCAACCCAGCCCGGCAGTTTACTTCAGGTGTTATCGTCGACTGACACCCTCAGCTTTCTCCCATTACACAGCGAGTATTTTCCGCGTAGCAATGGCAGTGACTTTGAGCGCACACTCAGAAGCCGTTGGAATGGCACCGGGGACGGCCCGATTTAGCCCCGCACACCTCCTGGAATCTTAGATCGCACGGCGATCTCGGTTCAGGCACCAACCCCAAAGAGTGTTTTGAGTTTTTGGTATGGCTCGCCTCAATTATCGGTTTTCGCTGCTCTGTGCCTGTCAACTCGGCTAGCTGTCGTGTTTTGTCGATCAGTGCGTGGACACTCTCGGTCGATGGTCGTGGATGGGACTGTAGTAAGTTTCACCGAAGCAGGAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAACTTCGCTTCATATAACGTAGCCATAGTGCTGTCTGCCATCAATAAGTCTTGCTCAGTGGTGCATACGTCGGGGAGGTTTGTTCCGCCTGGTCAGAACGAGTCTAGGGCGAGCCTATAGGCCAGTCGAGAGCCAAGATTCTATGAAATTAATACGACTACTGGGTGAGAGGTCATACAATTCCCGTGGAATCTGTACCTAAGATATTTCCAGATAGGGATGGCTACTGGTTAAGTTGACAGTGTCTAGATACGTGAGAGCACCTGAGAGGACGCCACGAGTCGGAGCGTGGGCGATCACCCTTCTGAGTCATAAGTCATGTCTATATATCCCTCACTAAAAAGGGCACACGACTATACATGCTTGAGCTTTACGGTCTGGCATGTGGAATGCCCGGAGCAACCCAGTCTTACCATCCTTTACGTACATTTACCGACCCGGCAGTGGCCGGCGCGGAAACCCAGGAGAACGTCGGTCATGATACGCGCCCTCCGCCGAAAGCGTGCTCACACCTCAGGATATCAGCGCTATTACCGGACGTCCCGCGTCCACCATCTAATAATTCAGGTGCTCCTAATAAGTGGGCTGGAGAGCGAGGATTGATATACGTTGAGGAGCTCCGACGGCCCTCTCGTGCGTTTGATGTAGATTGCGTTACCGACGGAGCACGCGTTTGTCAATTTCTGTCTAGGGACGTTTATGTCCTCAATACGAATACCAGGCCTATTTTAGTGTACAAATCACTTAGCAGTCGGAATTGGAAACCTGATGGAAGCGT"
counter=0
length=len(text)
search="AGATC"
tmp=0
for i in range(length):
if text[i:i + len(search)] == search:
tmp += 1
if tmp > counter:
counter = tmp
if text[i:i + len(search)] != search:
tmp = 0
print("done")
print(counter)
试试这个
重新导入
sequence=“aatdhdtksdhatat”
matches=re.findall(r'(?:AAT)+',序列)
最大=最大(匹配项,键=len)
打印(len(最大)//len('AAT'))
基本上,这种方法会找到字符串中的子字符串列表,然后选择最大的子字符串。子字符串的出现次数将是最大字符串的长度除以子字符串的长度首先,
regex
解决方案是Python解决此问题的方法。但是,如果您希望修复代码
代码的问题是索引无法确认找到了匹配项。您无法识别连续发生的事件
考虑这样一种情况,即您找到了三重匹配的开始,AATAATAAT
。您可以到达第一个A
,识别AAT
和增量tmp
。进入下一个循环迭代,现在i
指向第二个A
。您可以看到,它在这里不是AAT
(它是ATA
,跨越前两次出现),因此您记录一个实例并重置所有状态变量
相反,你必须跳到第一场比赛的末尾,寻找第二场比赛。由于索引不能以1的增量平滑移动,因此需要一个while
循环
请学习在变量有意义的地方使用有意义的变量名
任何意义<如果它所做的只是管理循环,那么code>i就可以了。作为
一旦你把它用于其他用途,给它一个真实的名字。同样地,
tmp
和count
确实需要更换
正则表达式是实现这一点的最简单方法。
snip_size = len(search)
pos = 0 # position in the genetic sequence
rep = 0 # number of consecutive repetitions
max_rep = 0 # longest repetition sequence found
while pos < length:
if text[pos:pos + snip_size] == search:
rep += 1
pos += snip_size
else:
max_rep = max(max_rep, rep)
rep = 0
pos += 1
print(max_rep, "repetitions found")
15 repetitions found