Python 使用ord进行字符串搜索比使用“in”方法慢
我有两个版本的简单文本解析器,用于验证登录正确性:Python 使用ord进行字符串搜索比使用“in”方法慢,python,regex,benchmarking,cprofile,Python,Regex,Benchmarking,Cprofile,我有两个版本的简单文本解析器,用于验证登录正确性: rgx = re.compile(r"^[a-zA-Z][a-zA-Z0-9.-]{0,18}[a-zA-Z0-9]$") def rchecker(login): return bool(rgx.match(login)) max_len = 20 def occhecker(login): length_counter = max_len for c in login: o = ord(c)
rgx = re.compile(r"^[a-zA-Z][a-zA-Z0-9.-]{0,18}[a-zA-Z0-9]$")
def rchecker(login):
return bool(rgx.match(login))
max_len = 20
def occhecker(login):
length_counter = max_len
for c in login:
o = ord(c)
if length_counter == max_len:
if not (o > 96 and o < 123) and \
not (o > 64 and o < 91): return False
if length_counter == 0: return False
# not a digit
# not a uppercase letter
# not a downcase letter
# not a minus or dot
if not (o > 47 and o < 58) and \
not (o > 96 and o < 123) and \
not (o > 64 and o < 91) and \
o != 45 and o != 46: return False
length_counter -= 1
if length_counter < max_len:
o = ord(c)
if not (o > 47 and o < 58) and \
not (o > 96 and o < 123) and \
not (o > 64 and o < 91): return False
else: return True
else: return False
correct_end = string.ascii_letters + string.digits
correct_symbols = correct_end + "-."
def cchecker(login):
length_counter = max_len
for c in login:
if length_counter == max_len and c not in string.ascii_letters:
return False
if length_counter == 0:
return False
if c not in correct_symbols:
return False
length_counter -= 1
if length_counter < max_len and c in correct_end:
return True
else:
return False
与作战需求文件
用in方法
我用正确的形式创建了100k个登录,60k个登录使用西里尔字母,60k个登录长度为24而不是20,60k个登录长度为0。因此,有28万人。如何解释正则表达式比ord的simple cycle快得多?简单的答案是正则表达式很快,其他方法涉及大量纯python代码,但正则表达式模块是C优化的。此外,在编译正则表达式时会执行许多工作,而这不计入性能计数器 要进一步挖掘,请使用显示python操作码的dis模块:
>>> dis.dis(rchecker)
4 0 LOAD_GLOBAL 0 (bool)
3 LOAD_GLOBAL 1 (rgx)
6 LOAD_ATTR 2 (match)
9 LOAD_FAST 0 (login)
12 CALL_FUNCTION 1
15 CALL_FUNCTION 1
18 RETURN_VALUE
>>> dis.dis(occhecker)
8 0 LOAD_GLOBAL 0 (max_len)
3 STORE_FAST 1 (length_counter)
9 6 SETUP_LOOP 224 (to 233)
9 LOAD_FAST 0 (login)
12 GET_ITER
>> 13 FOR_ITER 216 (to 232)
16 STORE_FAST 2 (c)
.... OUTPUT TRUNCATED, BUT THERE ARE MANY OPCODES ....
32 >> 343 LOAD_GLOBAL 2 (False)
346 RETURN_VALUE
>> 347 LOAD_CONST 0 (None)
350 RETURN_VALUE
谢谢你的dis模块。但最奇怪的是ord的函数太慢了。我没有深入研究操作,但我怀疑有很多函数调用ord,这就是为什么它慢的原因
3450737 function calls in 8.599 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
280000 5.802 0.000 8.599 0.000 logineffcheck.py:14(occhecker)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
3170736 2.797 0.000 2.797 0.000 {ord}
280001 function calls in 1.709 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
280000 1.709 0.000 1.709 0.000 logineffcheck.py:52(cchecker)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
>>> dis.dis(rchecker)
4 0 LOAD_GLOBAL 0 (bool)
3 LOAD_GLOBAL 1 (rgx)
6 LOAD_ATTR 2 (match)
9 LOAD_FAST 0 (login)
12 CALL_FUNCTION 1
15 CALL_FUNCTION 1
18 RETURN_VALUE
>>> dis.dis(occhecker)
8 0 LOAD_GLOBAL 0 (max_len)
3 STORE_FAST 1 (length_counter)
9 6 SETUP_LOOP 224 (to 233)
9 LOAD_FAST 0 (login)
12 GET_ITER
>> 13 FOR_ITER 216 (to 232)
16 STORE_FAST 2 (c)
.... OUTPUT TRUNCATED, BUT THERE ARE MANY OPCODES ....
32 >> 343 LOAD_GLOBAL 2 (False)
346 RETURN_VALUE
>> 347 LOAD_CONST 0 (None)
350 RETURN_VALUE