Python中的生成器函数
我目前正在研究麻省理工学院开放式课程的一个习题集,任务是在DNA序列中找到匹配的子串 我正在努力编写一个返回长度为k的子序列的函数。当使用字符串时,我可以让它工作,但问题是使用迭代器设置的,当使用迭代器时,函数似乎每次都会重置,而不是返回其原始位置 下面是我编写的一个使用字符串的正确函数:Python中的生成器函数,python,function,iterator,generator,string-matching,Python,Function,Iterator,Generator,String Matching,我目前正在研究麻省理工学院开放式课程的一个习题集,任务是在DNA序列中找到匹配的子串 我正在努力编写一个返回长度为k的子序列的函数。当使用字符串时,我可以让它工作,但问题是使用迭代器设置的,当使用迭代器时,函数似乎每次都会重置,而不是返回其原始位置 下面是我编写的一个使用字符串的正确函数: def subs(seq, k): subseq = '' pos = 0 while pos < len(seq): while len(subseq) <
def subs(seq, k):
subseq = ''
pos = 0
while pos < len(seq):
while len(subseq) < k:
subseq += seq[pos]
pos += 1
yield subseq, pos - k
subseq = subseq[1:]
我目前的解决方案是:
def subsequenceHashes(seq, k):
subseq = ''
pos = 0
print 'Start of subseqHashes'
try:
while True:
while len(subseq) < k:
subseq += seq.next()
pos += 1
print subseq, pos - k
yield hash(subseq), pos - k
subseq = subseq[1:]
except StopIteration:
return
运行测试时会发生什么情况:
Start of subseqHashes
yab 0
Start of subseqHashes
xxa 0
starting
iterate
Start of subseqHashes
cab 0
0
iterate
Start of subseqHashes
cab 0
0
iterate
Start of subseqHashes
F..
======================================================================
FAIL: test_one (__main__.TestExactSubmatches)
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\Users\Alex\Desktop\Pythonwork\6.006\ps4\dist\test_dnaseq.py", line 32, in test_one
self.assertTrue(len(matches) == len(correct))
AssertionError: False is not true
似乎出了问题的是,每当我使用.next()时,subsequencehash都会被重置,因为它的主体中有一个迭代器,而不是使用字符串时留在循环中。正如@jornsharpe所提到的,我的错误是多次调用生成器函数,而不是实际对其进行迭代。每次调用时,例如
子序列hash(b,k)
它将再次启动。你应该在函数开始时创建一次。我将要比较的DNA序列有几千万个核苷酸长,习题集建议创建生成器函数。是的,但是您应该只调用生成器函数一次。在那之后,您只想对它进行迭代,而不想继续重新启动它。从gen_a=subsequencehash(a,k)
开始,然后从那里开始。注意,只有数千万个字符的字符串很容易放入内存。您应该首先尝试这个简单的解决方案,并且只有当您确实有内存问题时才切换到生成器/迭代器
def subsequenceHashes(seq, k):
subseq = ''
pos = 0
print 'Start of subseqHashes'
try:
while True:
while len(subseq) < k:
subseq += seq.next()
pos += 1
print subseq, pos - k
yield hash(subseq), pos - k
subseq = subseq[1:]
except StopIteration:
return
def getExactSubmatches(a, b, k, m):
# a and b are the strings compared, k is the length of substring, parameter m is unused, need it for later on in the problem set
ahash, apos = subsequenceHashes(a, k).next()
bhash, bpos = subsequenceHashes(b, k).next()
multidict = Multidict()
print 'starting'
while ahash:
print 'iterate'
multidict.put(ahash, ('a', apos))
ahash, apos = subsequenceHashes(a, k).next()
print apos
while bhash:
multidict.put(bhash, ('b', bpos))
bhash, bpos = subsequenceHashes(b, k).next()
for key in multidict.mydict:
if len(multidict.get(key)) > 1:
for t in multidict.get(key):
if t[0] == 'a':
for s in multidict.get(key):
if s[0] == 'b':
if a[apos:apos+k] == b[bpos:bpos+k]:
print apos, bpos
yield apos, bpos
Start of subseqHashes
yab 0
Start of subseqHashes
xxa 0
starting
iterate
Start of subseqHashes
cab 0
0
iterate
Start of subseqHashes
cab 0
0
iterate
Start of subseqHashes
F..
======================================================================
FAIL: test_one (__main__.TestExactSubmatches)
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\Users\Alex\Desktop\Pythonwork\6.006\ps4\dist\test_dnaseq.py", line 32, in test_one
self.assertTrue(len(matches) == len(correct))
AssertionError: False is not true