Python 正则表达式的回溯

Python 正则表达式的回溯,python,regex,Python,Regex,假设我有一个正则表达式: match = re.search(pattern, content) if not match: raise Exception, 'regex traceback' # i want to throw here the regex matching process. 如果正则表达式无法匹配,那么我想加入异常它的工作以及它在哪里无法匹配正则表达式模式,在什么阶段等等。甚至可以实现所需的功能吗?我过去使用过Kodos()来执行RegEx调试。这不是一个理想的解

假设我有一个正则表达式:

match = re.search(pattern, content)
if not match:
    raise Exception, 'regex traceback' # i want to throw here the regex matching process.

如果正则表达式
无法匹配
,那么我想加入
异常
它的工作以及它在哪里无法匹配正则表达式模式,在什么阶段等等。甚至可以实现所需的功能吗?

我过去使用过Kodos()来执行RegEx调试。这不是一个理想的解决方案,因为您需要一些用于运行时的内容,但它可能会对您有所帮助。

如果您需要测试re,您可能可以使用组,后面跟着*。。。如在(某些文本中)* 使用这个和你想要的正则表达式,然后你应该能够找出你的失败位置

然后利用python.org上所述的以下内容

位置 传递给RegexObject的search()或match()方法的pos值。这是重新引擎开始查找匹配的字符串的索引

endpos 传递给>RegexObject的search()或match()方法的endpos值。这是字符串的索引,引擎不会超过该索引

最后索引 上次匹配的捕获组的整数索引,如果根本没有匹配的组,则为无。例如,表达式(a)b、((a)(b))和((ab))如果应用于字符串“ab”,则lastindex==1,而表达式(a)(b)如果应用于同一字符串,则lastindex==2

最后一组 上次匹配的捕获组的名称,如果该组没有名称,或者如果根本没有匹配的组,则为“无”

再 其match()或search()方法生成此MatchObject实例的正则表达式对象

串 传递给match()或search()的字符串

举个简单的例子

>>> m1 = re.compile(r'the real thing')
>>> m2 = re.compile(r'(the)* (real)* (thing)*')
>>> if not m1.search(mytextvar):
>>>     res = m2.search(mytextvar)
>>>     print res.lastgroup
>>>     #raise my exception

我有一些东西可以帮助我调试代码中复杂的正则表达式模式。
这对你有帮助吗

import re

li = ('ksjdhfqsd\n'
      '5 12478 abdefgcd ocean__12      ty--\t\t ghtr789\n'
      'qfgqrgqrg',

      '6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12340\n',

      '2 47890 bbcedefg arctic__124    **juyf\t\t ghtr89877',

      '9 54879 bbdecddf antarctic__13  18:13pomodoro\t\t ghtr6798',


      'ksjdhfqsd\n'
      '5 12478 abdefgcd ocean__1247101247887 ty--\t\t ghtr789\n'
      'qfgqrgqrg',

      '6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12940\n',

      '25 47890 bbcedefg arctic__124    **juyf\t\t ghtr89877',

      '9 54879 bbdeYddf antarctic__13  18:13pomodoro\t\t ghtr6798')


tupleRE = ('^\d',
           ' ',
           '\d{5}',
           ' ',
           '[abcdefghi]+',
           ' ',
           '(?=[a-z\d_ ]{14} [^ ]+\t\t ght)',
           '[a-z]+',
           '__',
           '[\d]+',
           ' +',
           '[^\t]+',
           '\t\t',
           ' ',
           'ght',
           '(r[5-9]+|u[0-4]+)',
           '$')  



def REtest(ch, tuplRE, flags = re.MULTILINE):
    for n in xrange(len(tupleRE)):
        regx = re.compile(''.join(tupleRE[:n+1]), flags)
        testmatch = regx.search(ch)
        if not testmatch:
            print '\n  -*- tupleRE :\n'
            print '\n'.join(str(i).zfill(2)+' '+repr(u)
                            for i,u in enumerate(tupleRE[:n]))
            print '   --------------------------------'
            # tupleRE doesn't works because of element n
            print str(n).zfill(2)+' '+repr(tupleRE[n])\
                  +"   doesn't match anymore from this ligne "\
                  +str(n)+' of tupleRE'
            print '\n'.join(str(n+1+j).zfill(2)+' '+repr(u)
                            for j,u in enumerate(tupleRE[n+1:
                                                         min(n+2,len(tupleRE))]))

            for i in xrange(n):
                match = re.search(''.join(tupleRE[:n-i]),ch, flags)
                if match:
                    break

            matching_portion = match.group()
            matching_li = '\n'.join(map(repr,
                                        matching_portion.splitlines(True)[-5:]))
            fin_matching_portion = match.end()
            print ('\n\n  -*- Part of the tested string which is concerned :\n\n'
                   '######### matching_portion ########\n'+matching_li + '\n'
                   '##### end of matching_portion #####\n'
                   '-----------------------------------\n'
                   '######## unmatching_portion #######')
            print '\n'.join(map(repr,
                                ch[fin_matching_portion:
                                   fin_matching_portion+300].splitlines(True)) )
            break
    else:
        print '\n  SUCCES . The regex integrally matches.'



for x in li:
    print '  -*- Analyzed string :\n%r' % x
    REtest(x,tupleRE)
    print '\nmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm'
结果

  -*- Analyzed string :
'ksjdhfqsd\n5 12478 abdefgcd ocean__12      ty--\t\t ghtr789\nqfgqrgqrg'

  SUCCESS . The regex integrally matches.

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
  -*- Analyzed string :
'6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12340\n'

  SUCCESS . The regex integrally matches.

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
  -*- Analyzed string :
'2 47890 bbcedefg arctic__124    **juyf\t\t ghtr89877'

  SUCCESS . The regex integrally matches.

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
  -*- Analyzed string :
'9 54879 bbdecddf antarctic__13  18:13pomodoro\t\t ghtr6798'

  SUCCESS . The regex integrally matches.

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
  -*- Analyzed string :
'ksjdhfqsd\n5 12478 abdefgcd ocean__1247101247887 ty--\t\t ghtr789\nqfgqrgqrg'

  -*- tupleRE :

00 '^\\d'
01 ' '
02 '\\d{5}'
03 ' '
04 '[abcdefghi]+'
05 ' '
   --------------------------------
06 '(?=[a-z\\d_ ]{14} [^ ]+\t\t ght)'   doesn't match anymore from this ligne 6 of tupleRE
07 '[a-z]+'


  -*- Part of the tested string which is concerned :

######### matching_portion ########
'5 12478 abdefgcd '
##### end of matching_portion #####
-----------------------------------
######## unmatching_portion #######
'ocean__1247101247887 ty--\t\t ghtr789\n'
'qfgqrgqrg'

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
  -*- Analyzed string :
'6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12940\n'

  -*- tupleRE :

00 '^\\d'
01 ' '
02 '\\d{5}'
03 ' '
04 '[abcdefghi]+'
05 ' '
06 '(?=[a-z\\d_ ]{14} [^ ]+\t\t ght)'
07 '[a-z]+'
08 '__'
09 '[\\d]+'
10 ' +'
11 '[^\t]+'
12 '\t\t'
13 ' '
14 'ght'
15 '(r[5-9]+|u[0-4]+)'
   --------------------------------
16 '$'   doesn't match anymore from this ligne 16 of tupleRE



  -*- Part of the tested string which is concerned :

######### matching_portion ########
'6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12'
##### end of matching_portion #####
-----------------------------------
######## unmatching_portion #######
'940\n'

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
  -*- Analyzed string :
'25 47890 bbcedefg arctic__124    **juyf\t\t ghtr89877'

  -*- tupleRE :

00 '^\\d'
   --------------------------------
01 ' '   doesn't match anymore from this ligne 1 of tupleRE
02 '\\d{5}'


  -*- Part of the tested string which is concerned :

######### matching_portion ########
'2'
##### end of matching_portion #####
-----------------------------------
######## unmatching_portion #######
'5 47890 bbcedefg arctic__124    **juyf\t\t ghtr89877'

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
  -*- Analyzed string :
'9 54879 bbdeYddf antarctic__13  18:13pomodoro\t\t ghtr6798'

  -*- tupleRE :

00 '^\\d'
01 ' '
02 '\\d{5}'
03 ' '
04 '[abcdefghi]+'
   --------------------------------
05 ' '   doesn't match anymore from this ligne 5 of tupleRE
06 '(?=[a-z\\d_ ]{14} [^ ]+\t\t ght)'


  -*- Part of the tested string which is concerned :

######### matching_portion ########
'9 54879 bbde'
##### end of matching_portion #####
-----------------------------------
######## unmatching_portion #######
'Yddf antarctic__13  18:13pomodoro\t\t ghtr6798'

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm

看起来你所拥有的一切都会起作用。你测试过了吗?看看是的,我用过,发现很有用,但是有点复杂:p