Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用ord进行字符串搜索比使用“in”方法慢_Python_Regex_Benchmarking_Cprofile - Fatal编程技术网

Python 使用ord进行字符串搜索比使用“in”方法慢

Python 使用ord进行字符串搜索比使用“in”方法慢,python,regex,benchmarking,cprofile,Python,Regex,Benchmarking,Cprofile,我有两个版本的简单文本解析器,用于验证登录正确性: rgx = re.compile(r"^[a-zA-Z][a-zA-Z0-9.-]{0,18}[a-zA-Z0-9]$") def rchecker(login): return bool(rgx.match(login)) max_len = 20 def occhecker(login): length_counter = max_len for c in login: o = ord(c)

我有两个版本的简单文本解析器,用于验证登录正确性:

rgx = re.compile(r"^[a-zA-Z][a-zA-Z0-9.-]{0,18}[a-zA-Z0-9]$")
def rchecker(login):
    return bool(rgx.match(login))

max_len = 20
def occhecker(login):
    length_counter = max_len
    for c in login:
        o = ord(c)
        if length_counter == max_len:
            if not (o > 96 and o < 123) and \
               not (o > 64 and o < 91): return False
        if length_counter == 0: return False

        # not a digit
        # not a uppercase letter
        # not a downcase letter
        # not a minus or dot
        if not (o > 47 and o < 58) and \
           not (o > 96 and o < 123) and \
           not (o > 64 and o < 91) and \
           o != 45 and o != 46: return False
        length_counter -= 1

    if length_counter < max_len:
        o = ord(c)
        if not (o > 47 and o < 58) and \
           not (o > 96 and o < 123) and \
           not (o > 64 and o < 91): return False
        else: return True
    else: return False


correct_end = string.ascii_letters + string.digits
correct_symbols = correct_end + "-."
def cchecker(login):
    length_counter = max_len

    for c in login:
        if length_counter == max_len and c not in string.ascii_letters:
            return False
        if length_counter == 0:
            return False
        if c not in correct_symbols:
            return False
        length_counter -= 1

    if length_counter < max_len and c in correct_end:
        return True
    else:
        return False
与作战需求文件

用in方法


我用正确的形式创建了100k个登录,60k个登录使用西里尔字母,60k个登录长度为24而不是20,60k个登录长度为0。因此,有28万人。如何解释正则表达式比ord的simple cycle快得多?

简单的答案是正则表达式很快,其他方法涉及大量纯python代码,但正则表达式模块是C优化的。此外,在编译正则表达式时会执行许多工作,而这不计入性能计数器

要进一步挖掘,请使用显示python操作码的dis模块:

>>> dis.dis(rchecker)
  4           0 LOAD_GLOBAL              0 (bool)
              3 LOAD_GLOBAL              1 (rgx)
              6 LOAD_ATTR                2 (match)
              9 LOAD_FAST                0 (login)
             12 CALL_FUNCTION            1
             15 CALL_FUNCTION            1
             18 RETURN_VALUE



>>> dis.dis(occhecker)
  8           0 LOAD_GLOBAL              0 (max_len)
              3 STORE_FAST               1 (length_counter)

  9           6 SETUP_LOOP             224 (to 233)
              9 LOAD_FAST                0 (login)
             12 GET_ITER
        >>   13 FOR_ITER               216 (to 232)
             16 STORE_FAST               2 (c)

   ....  OUTPUT TRUNCATED, BUT THERE ARE MANY OPCODES ....

 32     >>  343 LOAD_GLOBAL              2 (False)
            346 RETURN_VALUE
        >>  347 LOAD_CONST               0 (None)
            350 RETURN_VALUE

谢谢你的dis模块。但最奇怪的是ord的函数太慢了。我没有深入研究操作,但我怀疑有很多函数调用ord,这就是为什么它慢的原因
    3450737 function calls in 8.599 seconds

 Ordered by: standard name

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 280000    5.802    0.000    8.599    0.000 logineffcheck.py:14(occhecker)
      1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
3170736    2.797    0.000    2.797    0.000 {ord}
    280001 function calls in 1.709 seconds

 Ordered by: standard name

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 280000    1.709    0.000    1.709    0.000 logineffcheck.py:52(cchecker)
      1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
>>> dis.dis(rchecker)
  4           0 LOAD_GLOBAL              0 (bool)
              3 LOAD_GLOBAL              1 (rgx)
              6 LOAD_ATTR                2 (match)
              9 LOAD_FAST                0 (login)
             12 CALL_FUNCTION            1
             15 CALL_FUNCTION            1
             18 RETURN_VALUE



>>> dis.dis(occhecker)
  8           0 LOAD_GLOBAL              0 (max_len)
              3 STORE_FAST               1 (length_counter)

  9           6 SETUP_LOOP             224 (to 233)
              9 LOAD_FAST                0 (login)
             12 GET_ITER
        >>   13 FOR_ITER               216 (to 232)
             16 STORE_FAST               2 (c)

   ....  OUTPUT TRUNCATED, BUT THERE ARE MANY OPCODES ....

 32     >>  343 LOAD_GLOBAL              2 (False)
            346 RETURN_VALUE
        >>  347 LOAD_CONST               0 (None)
            350 RETURN_VALUE