Python 捕获属性名称_Python_Regex

Python 捕获属性名称

python regex

Python 捕获属性名称,python,regex,Python,Regex,我正在扫描一个“.twig”（PHP模板）文件，并试图捕获一个对象的属性名细枝文件包含如下行（字符串）： {{ product.id }} {{ product.parentProductId }} {{ product.countdown.startDate | date('Y/m/d H:i:s') }} {{ product.countdown.endDate | date('Y/m/d H:i:s') }} {{ product.countdown.expireDate | date(

我正在扫描一个“.twig”（PHP模板）文件，并试图捕获一个对象的属性名

细枝文件包含如下行（字符串）：

{{ product.id }}
{{ product.parentProductId }}
{{ product.countdown.startDate | date('Y/m/d H:i:s') }}
{{ product.countdown.endDate | date('Y/m/d H:i:s') }}
{{ product.countdown.expireDate | date('Y/m/d H:i:s') }}
{{ product.primaryImage.originalUrl }}
{{ product.image(1).originalUrl }}
{{ product.image(1).thumbUrl }}
{{ product.priceWithTax(preferences.default_currency) | money }}

我想捕捉的是：

.id
.parentProductId
.countdown
.startDate
.endDate
.expireDate
.primaryImage
.originalUrl
.image(1)
.originalUrl
.thumbUrl
.priceWithTax(preferences.default_currency)

基本上，我正在尝试找出

产品

对象的属性。我有以下模式，但它不捕获链接属性。比如说,

“{.+？产品（\.[a-zA-Z]+（？：\（.+？\）{，1}）+.+？}}”

仅捕获

.startDate

，但它应分别捕获

.countdown

和

.startDate

。这是不可能的，还是我遗漏了什么

我可以捕获（

“{.+？产品（（？：\.[a-zA-Z]+（？：\（.+？\）{，1}）+.+？}”）作为一个整体（.countdown.startDate
）然后检查/拆分它，但这听起来很麻烦。
如果您想用单个正则表达式处理它，您可能需要使用PyPiregex
模块：
import regex

s = """{{ product.id }}
{{ product.parentProductId }}
{{ product.countdown.startDate | date('Y/m/d H:i:s') }}
{{ product.primaryImage.originalUrl }}
{{ product.image(1).originalUrl }}
{{ product.priceWithTax(preferences.default_currency) | money }}"""

rx = r'{{[^{}]*product(\.[a-zA-Z]+(?:\([^()]+\))?)*[^{}]*}}'

l = [m.captures(1) for m in regex.finditer(rx, s)]

print([item for sublist in l for item in sublist])
# => ['.id', '.parentProductId', '.countdown', '.startDate', '.primaryImage', '.originalUrl', '.image(1)', '.originalUrl', '.priceWithTax(preferences.default_currency)']

见
{{[^{}]*乘积（\.[a-zA-Z]+（？：\（[^（）]+\）？）*[^{}]*}
正则表达式将匹配

{{
-{{{
子字符串
[^{}]*
-0+字符，而不是{
和}
product
-子字符串product
（\.[a-zA-Z]+（？：\（[^（）]+\））*-捕获组1：零个或多个

\。
-一个点
[a-zA-Z]+
-1+ASCII字母
（？：\（[^（）]+\）？
-可选的（
），除（
和）之外的1+字符序列，然后是）


[^{}]*
-0+字符，而不是{
和}
}
-a}
子字符串

如果您仅限于re
，则需要将所有属性捕获到一个捕获组中（将此（\.[a-zA-Z]+（？：\（[^（）]+\）？）*
打包到（…）
）中，然后运行基于正则表达式的后期处理，以按（
而不是括号内进行拆分：
import re
rx = r'{{[^{}]*product((?:\.[a-zA-Z]+(?:\([^()]+\))?)*)[^{}]*}}'
l = re.findall(rx, s)
res = []
for m in l:
     res.extend([".{}".format(n) for n in filter(None, re.split(r'\.(?![^()]*\))', m))])
print(res)
# => ['.id', '.parentProductId', '.countdown', '.startDate', '.primaryImage', '.originalUrl', '.image(1)', '.originalUrl', '.priceWithTax(preferences.default_currency)']

请参见
尝试此项，捕获您需求中的所有内容
^{{ product(\..*?[(][^\d\/]+[)]).*?}}|^{{ product(\..*?)(\..*?)?(?= )

我决定坚持使用re
（而不是Victor建议的regex
），这就是我最终的结果：
import re, json

file = open("test.twig", "r", encoding="utf-8")
content = file.read()
file.close()

patterns = {
    "template"  : r"{{[^{}]*product((?:\.[a-zA-Z]+(?:\([^()]+\))?)*)[^{}]*}}",
    "prop"      : r"^[^\.]+$",                  # .id
    "subprop"   : r"^[^\.()]+(\.[^\.]+)+$",     # .countdown.startDate
    "itemprop"  : r"^[^\.]+\(\d+\)\.[^\.]+$",   # .image(1).originalUrl
    "method"    : r"^[^\.]+\(.+\)$",            # .priceWithTax(preferences.default_currency)
}

temp_re = re.compile(patterns["template"])
matches = temp_re.findall(content)

product = {}

for match in matches:
    match = match[1:]
    if re.match(patterns["prop"], match):
        product[match] = match
    elif re.match(patterns["subprop"], match):
        match = match.split(".")
        if match[0] not in product:
            product[match[0]] = []
        if match[1] not in product[match[0]]:
            product[match[0]].append(match[1])
    elif re.match(patterns["itemprop"], match):
        match = match.split(".")
        array = re.sub("\(\d+\)", "(i)", match[0])
        if array not in product:
            product[array] = []
        if match[1] not in product[array]:
            product[array].append(match[1])
    elif re.match(patterns["method"], match):
        product[match] = match

props = json.dumps(product, indent=4)

print(props)

示例输出：
{
    "id": "id",
    "parentProductId": "parentProductId",
    "countdown": [
        "startDate",
        "endDate",
        "expireDate"
    ],
    "primaryImage": [
        "originalUrl"
    ],
    "image(i)": [
        "originalUrl",
        "thumbUrl"
    ],
    "priceWithTax(preferences.default_currency)": "priceWithTax(preferences.default_currency)"
}