Python 如何使用正则表达式解决这个问题？_Python_Regex_String_Search

Python 如何使用正则表达式解决这个问题？

python regex string search

Python 如何使用正则表达式解决这个问题？,python,regex,string,search,Python,Regex,String,Search,我有这样一个文本： ; Robert ( #Any kind of character here# ) #Any kind of character here#; John ( #Any kind of character here# ) if re.search(r'[;\s+]Robert\s*[(].*[)]\s*$', text, re.DOTALL) is not None: # Some code here elif re.search(r'[;\s+]John\s*[(

我有这样一个文本：

; Robert ( #Any kind of character here# ) #Any kind of character here#; 
John ( #Any kind of character here# )

if re.search(r'[;\s+]Robert\s*[(].*[)]\s*$', text, re.DOTALL) is not None:
    # Some code here
elif re.search(r'[;\s+]John\s*[(].*[)]\s*$', text, re.DOTALL) is not None:
    # Some code here

因此，为了查看文本是否像Robert（…）或John（…）那样使用Python中的正则表达式结尾，我使用了如下内容：

; Robert ( #Any kind of character here# ) #Any kind of character here#; 
John ( #Any kind of character here# )

if re.search(r'[;\s+]Robert\s*[(].*[)]\s*$', text, re.DOTALL) is not None:
    # Some code here
elif re.search(r'[;\s+]John\s*[(].*[)]\s*$', text, re.DOTALL) is not None:
    # Some code here

问题是，由于括号内可能有任何内容（甚至有更多对开-闭括号），我将de-dot与选项DOTALL一起使用，因此它一直运行到最后一个括号，每次都会找到“Robert（…）”，尽管正确的答案是“John（…）”

那么，我如何解决这个问题并使它停在正确的括号中以找到“John”？

re模块没有处理嵌套括号的功能，但是具有递归功能（以及更多）：

图案详情：

(?r)  # reverse search modifier: search from the end of the string
;\s*  #
(Robert|John) \s* # capture group 1
\(
(    # capture group 2
    [^()]*+ # all that isn't a bracket
    (?:
        \( (?2) \) # recursion with the capture group 2 subpattern
        [^()]*
    )*+
)
\) \s* $

re模块没有处理嵌套括号的功能，但是具有递归功能（以及更多功能）：

图案详情：

(?r)  # reverse search modifier: search from the end of the string
;\s*  #
(Robert|John) \s* # capture group 1
\(
(    # capture group 2
    [^()]*+ # all that isn't a bracket
    (?:
        \( (?2) \) # recursion with the capture group 2 subpattern
        [^()]*
    )*+
)
\) \s* $

免责声明，此帖子“有效”，但不应使用

因此，首先，正如我之前所评论的，regex并不意味着是递归的，如果您想干净地解决这个问题，您可能需要使用pyparsing这样的模块

如果你仍然拼命想射中自己的脚，并使用regex做一些它不打算做的事情，你可以使用

regex

模块。Casimir的一项技术用完全工作的递归正则表达式很好地解释了。我不建议你这样做，但我不能判断你目前的处境

但是，嘿，既然你能用它整条腿，为什么还要开枪打自己的脚呢？只使用内置的

re

模块，当然：D因此，不要再拖延了，这里要做的就是制造无法维护的混乱，无限期地保留您的工作，直到他们完全重写您正在做的任何事情：

import re

n = 25 # level of nesting allowed, must be specified due to python regex not being recursive
parensre = r"\([^()]*" + r"(?:\([^()]*" * n + r"[^()]*\))?" * n + r"[^()]*\)"

robertre = re.compile(r"Robert\s*" + parensre, re.M | re.S)
johnre   = re.compile(r"John\s*" + parensre, re.M | re.S)

tests = """
  Robert (Iwant(to(**doRegexMyWay(hithere) * 8) / 3) + 1) ; John (whatever())
John(I dont want to anymore())
"""

print robertre.findall(tests) # outputs ['Robert (Iwant(to(**doRegexMyWay(hithere) * 8) / 3) + 1)']
print johnre.findall(tests)   # outputs ['John (whatever())', 'John(I dont want to anymore())']

当然，您可以混合和组合这些部分，而

parensre

是您已经倒塌的沙堡的基石。诀窍是创建n个（默认为25个）非捕获组，所有组都嵌套在彼此内部。单个组的结构类似于

（

非方括号捕获组非方括号

）

它产生的正则表达式的味道：

\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\)

TL；DR请不要试图用
re

免责声明来做这件事，这篇文章“有效”，但绝对不能使用

因此，首先，正如我之前所评论的，regex并不意味着是递归的，如果您想干净地解决这个问题，您可能需要使用pyparsing这样的模块

如果你仍然拼命想射中自己的脚，并使用regex做一些它不打算做的事情，你可以使用

regex

模块。Casimir的一项技术用完全工作的递归正则表达式很好地解释了。我不建议你这样做，但我不能判断你目前的处境

但是，嘿，既然你能用它整条腿，为什么还要开枪打自己的脚呢？只使用内置的

re

模块，当然：D因此，不要再拖延了，这里要做的就是制造无法维护的混乱，无限期地保留您的工作，直到他们完全重写您正在做的任何事情：

import re

n = 25 # level of nesting allowed, must be specified due to python regex not being recursive
parensre = r"\([^()]*" + r"(?:\([^()]*" * n + r"[^()]*\))?" * n + r"[^()]*\)"

robertre = re.compile(r"Robert\s*" + parensre, re.M | re.S)
johnre   = re.compile(r"John\s*" + parensre, re.M | re.S)

tests = """
  Robert (Iwant(to(**doRegexMyWay(hithere) * 8) / 3) + 1) ; John (whatever())
John(I dont want to anymore())
"""

print robertre.findall(tests) # outputs ['Robert (Iwant(to(**doRegexMyWay(hithere) * 8) / 3) + 1)']
print johnre.findall(tests)   # outputs ['John (whatever())', 'John(I dont want to anymore())']

当然，您可以混合和组合这些部分，而

parensre

是您已经倒塌的沙堡的基石。诀窍是创建n个（默认为25个）非捕获组，所有组都嵌套在彼此内部。单个组的结构类似于

（

非方括号捕获组非方括号

）

它产生的正则表达式的味道：

\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\)

TL；DR请永远不要尝试用
re

执行此操作。请提供一个正则表达式失败的示例。据我所知，这或多或少是重复的。TL；regex博士并不是为了这个而建的，而是为了，你在寻找一个更完整的解析器。对我来说，这听起来像是你在寻找懒惰的点星，即

*？

而不是

这里任何类型的字符的点星

匹配任何东西，除了一个正确的参数

[^）]*

\s*（罗伯特·约翰）（\（[^）]*\）

。然后使用re.finditer（或.findall）并使用找到的最后一个匹配项。您是否使用在线正则表达式测试仪来处理模式？如果不是，你应该。请提供一个例子，你的正则表达式失败。据我所知，这或多或少是重复的。TL；regex博士并不是为了这个而建的，而是为了，你在寻找一个更完整的解析器。对我来说，这听起来像是你在寻找懒惰的点星，即

*？

而不是

这里任何类型的字符的点星

匹配任何东西，除了一个正确的参数

[^）]*

\s*（罗伯特·约翰）（\（[^）]*\）

。然后使用re.finditer（或.findall）并使用找到的最后一个匹配项。您是否使用在线正则表达式测试仪来处理模式？如果不是，你应该。