在python中基于匹配关键字重建正则表达式字符串_Python_Regex

在python中基于匹配关键字重建正则表达式字符串

python regex

在python中基于匹配关键字重建正则表达式字符串,python,regex,Python,Regex,正则表达式示例 regex = re.compile('^page/(?P<slug>[-\w]+)/(?P<page_id>[0-9]+)/$') matches = regex.match('page/slug-name/5/') >> matches.groupdict() {'slug': 'slug-name', 'page_id': '5'} regex=re.compile（“^page/（？P[-\w]+）/（？P[0-9]+）/$”） ma

正则表达式示例

regex = re.compile('^page/(?P<slug>[-\w]+)/(?P<page_id>[0-9]+)/$')
matches = regex.match('page/slug-name/5/')
>> matches.groupdict()
{'slug': 'slug-name', 'page_id': '5'}

regex=re.compile（“^page/（？P[-\w]+）/（？P[0-9]+）/$”）
matches=regex.match（'page/slug name/5/'））
>>匹配。groupdict（）
{'slug'：'slug name'，'page_id'：'5'}

有没有一种简单的方法将dict传递回正则表达式以重建字符串

i、 e.

{'slug'：'newslug'，'page_id'：'6'}

将产生

page/newslug/6/

dict，我认为string

format

方法更适合：

In [16]: d={'slug': 'new-slug', 'page_id': '6'}

In [17]: 'page/{slug}/{page_id}'.format(**d)
Out[17]: 'page/new-slug/6'

有各种更复杂的正则表达式，但如果您总是在正则表达式模式中使用非嵌套命名匹配组

（？p..）

，并将

pat

限制为没有比

\A

、或

、

\Z

或

\b

更复杂的正则表达式，否则，那么也许你可以这样做：

import re
import string


pat=r'\Apage/(?P<slug>[-\w]+)/(?P<page_id>[0-9]+)/\Z'
regex = re.compile(pat)
matches = regex.match('page/slug-name/5/')
print(matches.groupdict())
# {'page_id': '5', 'slug': 'slug-name'}

# Convert '(?P<slug>...)' to '{slug}'    
reverse_pat=re.sub(r'\(\?P<(.*?)>.*?\)',r'{\1}',pat)
# Strip off the beginning ^ and ending $
reverse_pat=re.sub(r'^(?:\\A|\^)(.*)(?:\\Z|\$)$',r'\1',reverse_pat)
# drop any `\b`s.
reverse_pat=re.sub(r'\\b',r'',reverse_pat)
# there are many more such rules one could conceivably need... 
print(reverse_pat.format(**matches.groupdict()))
# page/slug-name/5/

重新导入
导入字符串
pat=r'\Apage/（？P[-\w]+）/（？P[0-9]+）/\Z'
regex=re.compile（pat）
matches=regex.match（'page/slug name/5/'））
打印（匹配.groupdict（））
#{'page_id'：'5'，'slug'：'slug name'}
#将“（？P…）”转换为“{slug}”
反向\u pat=re.sub（r'\（\？P.*？\）'，r'{\1}'，pat）
#去掉开头和结尾$
反向拍子=re.sub（r'^（？：\\A\^）（.*）（？：\\Z\$）$，r'\1'，反向拍子）
#删除任何`\b`s。
反向拍子=反向拍子（r'\\b'，r''，反向拍子）
#我们可以想象，还需要更多这样的规则。。。
打印（反向格式（**匹配.groupdict（））
#第页/段塞名称/5/

Django似乎能够做到这一点（有趣的是，它使用正则表达式来解析正则表达式）

您可以重用它提供的

reverse\u helper

和

MatchChecker

。

以下是一个不需要新正则表达式的解决方案：

import re
import operator

regex = re.compile('^page/(?P<slug>[-\w]+)/(?P<page_id>[0-9]+)/$')
matches = regex.match('page/slug-name/5/')
groupdict = {'slug': 'new-slug', 'page_id': '6'}
prev_index = matches.start(0)
new_string = ""
for group, index in sorted(regex.groupindex.iteritems(), key=operator.itemgetter(1)):
    new_string += matches.string[prev_index:matches.start(index)] + groupdict[group]
    prev_index = matches.end(index)

new_string += matches.string[prev_index:matches.end(0)]
print new_string
# 'page/new-slug/6/'

重新导入
进口经营者
regex=re.compile（“^page/（？P[-\w]+）/（？P[0-9]+）/$”）
matches=regex.match（'page/slug name/5/'））
groupdict={'slug'：'newslug'，'page_id'：'6'}
上一个索引=匹配项。开始（0）
new_string=“”
对于组，索引已排序（regex.groupindex.iteritems（），key=operator.itemgetter（1））：
new_string+=matches.string[上一个索引：matches.start（索引）]+groupdict[组]
上一个索引=匹配项。结束（索引）
new_string+=matches.string[上一个索引：matches.end（0）]
打印新字符串
#“page/new slug/6/”

这是通过将命名组替换为

groupdict

中提供的值来实现的，字符串的其余部分使用输入字符串上的切片插入（

matches.string

）

new_字符串

将是原始字符串中与正则表达式匹配的部分，并带有相关替换项。要获取

new\u字符串

甚至包括字符串的不匹配部分，请替换

prev\u index=matches。使用prev\u index=0
开始（0）

，并删除

匹配项。结束（0）

从for循环后的最后一个切片中。这里有一个使用sre\u解析的解决方案

import re
from sre_parse import parse

pattern = r'^page/(?P<slug>[-\w]+)/(?P<page_id>[0-9]+)/$'
regex = re.compile(pattern)
matches = regex.match('page/slug-name/5/')
params = matches.groupdict()
print params
>> {'page_id': '5', 'slug': 'slug-name'}

lookup = dict((v,k) for k, v in regex.groupindex.iteritems())
frags = [chr(i[1]) if i[0] == 'literal' else str(params[lookup[i[1][0]]]) \
    for i in parse(pattern) if i[0] != 'at']
print ''.join(frags)
>> page/slug-name/5/

重新导入
从sre_parse导入解析
模式=r'^page/（？P[-\w]+）/（？P[0-9]+）/$”
regex=re.compile（模式）
matches=regex.match（'page/slug name/5/'））
params=matches.groupdict（）
打印参数
>>{'page_id'：'5'，'slug'：'slug name'}
lookup=dict（（v，k）表示regex.groupindex.iteritems（）中的k，v）
frags=[chr（i[1]），如果i[0]==“literal”else str（参数[lookup[i[1][0]]））\
对于解析（模式）中的i，如果i[0]！='at']
打印“”。加入（frags）
>>第页/段塞名称/5/

这是通过parse（）获取原始操作码，转储位置操作码（第一个参数为“at”），替换命名组，并在完成后连接碎片来实现的。

这是一个非常好的方法——比我处理regex时更健壮。