Python 通过列表项进行高效搜索
我有一个列表Python 通过列表项进行高效搜索,python,Python,我有一个列表lst(包含10K个项目)和查询项q,我想知道lst中是否有任何项目以q结尾 作为参考计时器,我将其设置为1,此语句: x = q in lst 我试过这些: # obvious endswith method y = [k for k in lst if k.endswith(q)] # find method z = [k for k in lst if k.find(q, len(k)-len(q))] # regex v = [k for k in lst if re.se
lst
(包含10K个项目)和查询项q
,我想知道lst
中是否有任何项目以q
结尾
作为参考计时器,我将其设置为1,此语句:
x = q in lst
我试过这些:
# obvious endswith method
y = [k for k in lst if k.endswith(q)]
# find method
z = [k for k in lst if k.find(q, len(k)-len(q))]
# regex
v = [k for k in lst if re.search(q + '$', k)]
# regex without list comprehension
w = re.search(q + '~', '~'.join(lst) + '~')
使用这些结果(根据x
timer进行计时):
所以我想我可以使用regex和joined list,除非有更好的实现
在现实世界中,我试图优化在执行时多次命中的代码块,我发现使用
.endswith
方法理解列表是一个瓶颈。我不认为正则表达式是可行的方法。即使我将joined='~'.join(lst)+'~'
分配到循环之外,joined中的q+'~'仍优于re.search(q+'~',joined)
(0.00093秒vs 0.0034秒)
但是,假设您还没有连接的字符串,那么不需要它的方法可能会更快。生成器可能很有用,因为它只在您需要时生成值(这样,一旦您在某个项的末尾找到查询,您就可以停止,而不是检查列表的其余部分)
这对我来说是最快的:any(如果k.endswith(q))
我的代码:
import timeit
setup = '''
import string
import random
import re
lst = []
for i in range(10000):
lst.append(random.choice(string.letters)+random.choice(string.letters)+random.choice(string.letters)+random.choice(string.letters))
q = 'ab'
'''
print "reference: "
print round(min(timeit.Timer("q in lst", setup=setup).repeat(7,500)),5)
# 0.05435
print "\nreference with joined string: "
print round(min(timeit.Timer("q+'~' in '~'.join(lst) + '~'", setup=setup).repeat(7,500)),5)
# 0.05462
print "\nendswith, with list approach: "
print round(min(timeit.Timer("any([k for k in lst if k.endswith(q)])", setup=setup).repeat(7,500)),5)
# 0.62998
print "\nfind method: "
print round(min(timeit.Timer("[k for k in lst if k.find(q, len(k)-len(q))]", setup=setup).repeat(7,500)),5)
# 1.22274
print "\nregex: "
print round(min(timeit.Timer("[k for k in lst if re.search(q + '$', k)]", setup=setup).repeat(7,500)),5)
# 3.73494
print "\nregex without list comprehension: "
print round(min(timeit.Timer("re.search(q + '~', '~'.join(lst) + '~')", setup=setup).repeat(7,500)),5)
# 0.05435
print "\nendswith, with generator approach: "
print round(min(timeit.Timer("any((k for k in lst if k.endswith(q)))", setup=setup).repeat(7,500)),5)
# 0.02052
您是否只想查找lst
中是否有任何项以q
结尾,还是需要一个以q
结尾的项列表?只想查找是否有这样的项-true/false“~”。regex搜索中的join(lst)
可以在循环外分配,这将使regex搜索提高3倍,在循环中使用这种搜索时。非常好。我不知何故忘记了joined中明显的q+'~,在我将结果与代码中的生成器进行比较后,我发现这是最好的方法。谢谢:)生成器中的any()
是个不错的主意,但我的循环中通常没有命中,所以它不如joined中的q+'~”有效
import timeit
setup = '''
import string
import random
import re
lst = []
for i in range(10000):
lst.append(random.choice(string.letters)+random.choice(string.letters)+random.choice(string.letters)+random.choice(string.letters))
q = 'ab'
'''
print "reference: "
print round(min(timeit.Timer("q in lst", setup=setup).repeat(7,500)),5)
# 0.05435
print "\nreference with joined string: "
print round(min(timeit.Timer("q+'~' in '~'.join(lst) + '~'", setup=setup).repeat(7,500)),5)
# 0.05462
print "\nendswith, with list approach: "
print round(min(timeit.Timer("any([k for k in lst if k.endswith(q)])", setup=setup).repeat(7,500)),5)
# 0.62998
print "\nfind method: "
print round(min(timeit.Timer("[k for k in lst if k.find(q, len(k)-len(q))]", setup=setup).repeat(7,500)),5)
# 1.22274
print "\nregex: "
print round(min(timeit.Timer("[k for k in lst if re.search(q + '$', k)]", setup=setup).repeat(7,500)),5)
# 3.73494
print "\nregex without list comprehension: "
print round(min(timeit.Timer("re.search(q + '~', '~'.join(lst) + '~')", setup=setup).repeat(7,500)),5)
# 0.05435
print "\nendswith, with generator approach: "
print round(min(timeit.Timer("any((k for k in lst if k.endswith(q)))", setup=setup).repeat(7,500)),5)
# 0.02052