Python %他在正则表达式中表现出奇怪的行为_Python_Regex

Python %他在正则表达式中表现出奇怪的行为

python regex

Python %他在正则表达式中表现出奇怪的行为,python,regex,Python,Regex,我有一个字符串，我想在其中找到括号前面的一些单词。假设字符串是- “世界上有许多患有结直肠癌（crc）的人也患有抑郁症（ds）” 我想在括号前最多捕获5个单词。我有一个缩略语列表，在括号内-[（crc），（ds）]。因此，我使用以下代码- acrolen=5 rt=[] for acro in acronym_list: find_words= re.findall('((?:\w+\W+){1,%d}%s)' %(acrolen, acro), text, re.I) for

我有一个字符串，我想在其中找到括号前面的一些单词。假设字符串是-

“世界上有许多患有结直肠癌（crc）的人也患有抑郁症（ds）”

我想在括号前最多捕获5个单词。我有一个缩略语列表，在括号内-

[（crc），（ds）]

。因此，我使用以下代码-

acrolen=5
rt=[]
for acro in acronym_list:
    find_words= re.findall('((?:\w+\W+){1,%d}%s)'  %(acrolen, acro), text, re.I)
    for word in find_words:
            rt.append(word)
print rt

但这就产生了这个结果——

('the world having colorectal cancer (crc', 'crc')
('also have the depression syndrome (ds', 'ds')

如果我使用正则表达式-

find_words=re.findall（“（（（？：\w+\w+{1，%d}\（crc\）”）”（acrolen），s，re.I）

然后它就能准确地找到我想要的东西，即

the world having colorectal cancer (crc)

问题是-为什么在这里对字符串使用

%s

，导致正则表达式匹配有如此大的不同（在它周围有不必要的括号，重复首字母缩写等）

如何正确使用第一个正则表达式，以便使用循环自动执行过程，而不必每次都在正则表达式中输入精确的字符串？

您需要确保传递的变量正确转义，以便在正则表达式模式中用作文本。使用

re.escape（acro）

：

见

另外，请注意，您不需要将整个模式包含在捕获组中，

re.findall

将在模式中未定义捕获组的情况下返回匹配值

还建议在定义正则表达式模式时使用原始字符串文字，以避免出现不明确的情况。

非常感谢，这提供了非常丰富的信息。但是，您能否解释一下

{{1，{0}}{1}

与

{1，%d}%s

的工作原理相同吗？在格式字符串中，

{n}

是方法参数的占位符。要表示文字大括号，它必须加倍。

import re
text = "there are many people in the world having colorectal cancer (crc) who also have the depression syndrome (ds)"
acrolen=5
rt=[]
acronym_list = ["(crc)", "(ds)"]
for acro in acronym_list:
    p = r'((?:\w+\W+){1,%d}%s)' %(acrolen, re.escape(acro))
    # Or, use format:
    # p = r'((?:\w+\W+){{1,{0}}}{1})'.format(acrolen, re.escape(acro))
    find_words= re.findall(p, text, re.I)
    for word in find_words:
        rt.append(word)
print rt