Python 用于提取占位符匹配项的正则表达式_Python_Regex_Regex Lookarounds_Regex Group

Python 用于提取占位符匹配项的正则表达式

python regex

Python 用于提取占位符匹配项的正则表达式,python,regex,regex-lookarounds,regex-group,Python,Regex,Regex Lookarounds,Regex Group,我有这根绳子 template=“你好，我的名字是，我是。” 我想测试我的字符串是否匹配该模板，并且任何东西都可以代替占位符。占位符以括号开头和结尾，如下图所示。这根绳子会相配的 string=“你好，我叫约翰·多伊，今年30岁。” 我还想提取字符串中替换占位符的部分。对于上面的示例，我想获得以下列表： ['johndoe'，'30岁'] 我能够使用regex的模式提取模板的占位符，但我目前一直在研究如何从字符串中提取实际的替换。我需要一个通用的方法，我不想硬编码模式来匹配完整的模板，因为

我有这根绳子

template=“你好，我的名字是，我是。”

我想测试我的字符串是否匹配该模板，并且任何东西都可以代替占位符。占位符以括号开头和结尾，如下图所示。这根绳子会相配的

string=“你好，我叫约翰·多伊，今年30岁。”

我还想提取字符串中替换占位符的部分。对于上面的示例，我想获得以下列表：

['johndoe'，'30岁']

我能够使用regex的模式

提取模板的占位符，但我目前一直在研究如何从字符串中提取实际的替换。我需要一个通用的方法，我不想硬编码模式来匹配完整的模板，因为我有很多模板要检查。有没有一种聪明的方法可以做到这一点？

如果所需的输出后面紧跟着问题中提到的精确标点符号，我们可以简单地使用类似以下的表达式：

is\s(.+?),|([0-9].+)\.

试验

您可以使用模板动态构建正则表达式。然后将其与任何输入字符串匹配

import re

template = "Hello my name is <name>, I'm <age>."
pattern = "^" + re.escape(template) + "$"
pattern = re.sub("<[^>]+>", "(?P\g<0>.*)", pattern)
regex = re.compile(pattern, re.DOTALL)

string = "Hello my name is John Doe, I'm 30 years old."
match = regex.match(string)

match.group(0)
#=> "Hello my name is John Doe, I'm 30 years old."
match.group("name")
#=> 'John Doe'
match.group("age")
#=> '30 years old'
match.groups()
#=> ('John Doe', '30 years old')

重新导入
template=“你好，我的名字是，我是。”
pattern=“^”+re.escape（模板）+“$”
pattern=re.sub（“]+>”，“（？P\g.*”，pattern）
regex=re.compile（模式，re.DOTALL）
string=“你好，我叫约翰·多伊，今年30岁。”
match=regex.match（字符串）
匹配组（0）
#=>“你好，我叫约翰·多伊，今年30岁。”
匹配。组（“名称”）
#=>“约翰·多伊”
匹配组（“年龄”）
#=>“30岁”
match.groups（）
#=>（“约翰·多伊”，“30岁”）

对模板的唯一限制是应使用有效的正则表达式组名

只需不使用命名的正则表达式组，就可以解决这个问题

# replacing
pattern = re.sub("<[^>]+>", "(?P\g<0>.*)", pattern)
# with
pattern = re.sub("<[^>]+>", "(.*)", pattern)

#更换
pattern=re.sub（“]+>”，“（？P\g.*”，pattern）
#与
pattern=re.sub（“]+>”，“（.*”，pattern）

将其与模板中的占位符交叉引用相结合，您就有了更多的命名选项

placeholders = re.findall("<[^>]+>", template)
placeholders = list(map(lambda match: match[1:-1], placeholders))

dict(zip(placeholders, match.groups()))
#=> {'name': 'John Doe', 'age': '30 years old'}

placeholders=re.findall（“]+>”，模板）
占位符=列表（映射（lambda匹配：匹配[1:-1]，占位符））
dict（zip（占位符，match.groups（）））
#=>{'name'：'John Doe'，'age'：'30岁}

谢谢，但不是我想要的。我想要一个更通用的方法。基本上，我想创建一个函数

func

，如果我调用

func（模板，字符串）

，它将返回

['John Doe'，'30岁']

func

也适用于其他模板。标点符号不重要。占位符（带有一对括号）可以。这是我第一次使用Python。如果您发现新代码/优化，请告诉我。将

“]+>”

更改为

“]*>”

以允许使用空占位符。这正是我需要的。谢谢：D

placeholders = re.findall("<[^>]+>", template)
placeholders = list(map(lambda match: match[1:-1], placeholders))

dict(zip(placeholders, match.groups()))
#=> {'name': 'John Doe', 'age': '30 years old'}