Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 递归命名正则表达式组_Python_Regex_Recursion - Fatal编程技术网

Python 递归命名正则表达式组

Python 递归命名正则表达式组,python,regex,recursion,Python,Regex,Recursion,我试图从非常难看的自由文本中提取地址详细信息: import regex pat_addr_verbose = """(?ix) # case insensitive and verbose flag (?:(?:BND|BY|CNR|OF)\W+)* # non-capturing (list) (?:(?!RD|HWY|TRAIL|St) # negative lookahead (list of street types) (?:

我试图从非常难看的自由文本中提取地址详细信息:

import regex

pat_addr_verbose = """(?ix)       # case insensitive and verbose flag
(?:(?:BND|BY|CNR|OF)\W+)*         # non-capturing (list)
(?:(?!RD|HWY|TRAIL|St)           # negative lookahead (list of street types)
(?:                              # either
(?P<n_start>\d+)-(?P<n_end>\d+)  # number sequence
|(?<!-)(?P<n>\d+)                      # single number
)\W+)?                               # No number, maybe non word character follows
(?P<name>
(?:
(?!RD|HWY|TRAIL|St)\w+\W*)+)\W+   # capturing words not preceded by (list of street types)
(?P<type>RD|HWY|TRAIL|St)*             # non-capturing (list of street types)
"""

pat_addr = regex.compile(pat_addr_verbose, regex.IGNORECASE & regex.VERBOSE)

text = """BND BY THOMAS RAIL TRAIL, 7 SNOW WHITE HWY & MICKEY RD,
337-343 BOGEYMAN RD, 4, 8, 9-13, 16-18 Fictional Rd & 17 Elm St"""

regex.findall(pat_addr, text)
我想知道是否有可能在regex中获得数字的
列表
(不知道它们是否被命名)或
dict

编辑:这是我希望得到的:

备选案文1:

{'numbers': 
    [
        {
            'n': '4',
            'n_end': None,
            'n_start': None,
        },
        {
            'n': '8',
            'n_end': None,
            'n_start': None,
        },
        {
            'n': None,
            'n_end': '13',
            'n_start': '9',
        },
        {
            'n': None,
            'n_end': '18',
            'n_start': '16',
        }
    ],
'name': 'Fictional',
'type': 'Rd'},
备选案文2:

    {'numbers': 
    [
        '4',
        '8',
        '9-13',
        '16-18'
    ],
'name': '8, 9-13, 16-18 Fictional',
'type': 'Rd'},

你能发布你期望得到的结果吗?@Colin,给你。你本质上是在要求,这是regex无法做到的。@RNar,可能不是所有的风格,但你提到的答案是在.NET中是可能的,而不是在JavaScript中。它没有提到Python。Python是只接受最后一个捕获的工具之一。我做了一个编辑,指定了预期的结果,请看一看。不过,谢谢你。我没有想到要得到整个序列。这将允许第二次通过(但如果可能的话,我宁愿避免)。@dmvianna只是有点困惑,因为您引入了一个新字段
number
。这是否意味着您希望在所有参赛作品中都将其作为主要停留?我有一个稍微不同的版本,看看是否适合你。你可以在那张纸条上扩展它。请注意,我是第一次使用新的正则表达式模块。谢谢你的回答。是的,这是理想的结果。使用一个正则表达式将非常好,但如果需要,我将使用更多步骤。您的答案提供了一个很好的第一关。@dmvianna刚刚更新了答案,看起来并不满意:-(@dmvianna根据您的操作查看最新的答案。
    {'numbers': 
    [
        '4',
        '8',
        '9-13',
        '16-18'
    ],
'name': '8, 9-13, 16-18 Fictional',
'type': 'Rd'},
(?ix)                             # case insensitive and verbose flag
(?:(?:BND|BY|CNR|OF)\W+)*         # non-capturing (list)

(?:                               #Number non capture Start
(?!RD|HWY|TRAIL|St)               # negative lookahead (list of street types)
                                  # EITHER
(?P<numbers>\d+-\d+|\d+)          #double number OR single number
\W+                               # No number, maybe non word character follows
)                                 #Number non capture End
*?                                #This Number group repeats to produce numbers

(?P<name>
(?:
(?!RD|HWY|TRAIL|St)[A-Z]+\W*)+)\W+   # capturing words not preceded by (list of street types)
(?P<type>RD|HWY|TRAIL|St)*
import regex

text='BND BY THOMAS RAIL TRAIL, 7 SNOW WHITE HWY & MICKEY RD, 337-343 BOGEYMAN RD, 4, 8, 9-13, 16-18 Fictional Rd & 17 Elm St'
reg=r'(?ix)(?:(?:BND|BY|CNR|OF)\W+)*(?:(?!RD|HWY|TRAIL|St)(?P<numbers>\d+-\d+|\d+)\W+)*?(?P<name>(?:(?!RD|HWY|TRAIL|St)[A-Z]+\W*)+)\W+(?P<type>RD|HWY|TRAIL|St)*'


def updateD(m):
  d=m.groupdict()
  d['numbers']=m.captures('numbers')
  return d

[updateD(m) for m in regex.finditer(reg,text)]
[
  {
   'numbers': [],
   'name': 'THOMAS RAIL',
   'type': 'TRAIL'
  }, 
  {
   'numbers': ['7'],
   'name': 'SNOW WHITE',
   'type': 'HWY'
  }, 
  {
   'numbers': [],
   'name': 'MICKEY',
   'type': 'RD'
  }, 
  {
   'numbers': ['337-343'],
   'name': 'BOGEYMAN',
   'type': 'RD'
  }, 
  {
   'numbers': ['4', '8', '9-13', '16-18'],
   'name': 'Fictional',
   'type': 'Rd'
  }, 
  {
   'numbers': ['17'],
   'name': 'Elm',
   'type': 'St'
  }
]