Python正则表达式A | B | C与C匹配，即使B应该匹配_Python_Regex_Nlp_Re

Python正则表达式A | B | C与C匹配，即使B应该匹配

python regex nlp

Python正则表达式A | B | C与C匹配，即使B应该匹配,python,regex,nlp,re,Python,Regex,Nlp,Re,我已经在这个问题上坐了好几个小时了，我真的不知道了。。。本质上，我有一个A | B | C类型的分隔正则表达式，无论出于什么原因，C与B匹配，即使单个正则表达式应该从左到右进行测试，并以非贪婪的方式停止（即，一旦找到匹配项，其他正则表达式就不再进行测试）这是我的代码： text = 'Patients with end stage heart failure fall into stage D of the ABCD classification of the American Colleg

我已经在这个问题上坐了好几个小时了，我真的不知道了。。。本质上，我有一个A | B | C类型的分隔正则表达式，无论出于什么原因，C与B匹配，即使单个正则表达式应该从左到右进行测试，并以非贪婪的方式停止（即，一旦找到匹配项，其他正则表达式就不再进行测试）

这是我的代码：

text = 'Patients with end stage heart failure fall into stage D of the ABCD classification of the American College of Cardiology (ACC)/American Heart Association (AHA), and class III–IV of the New York Heart Association (NYHA) functional classification; they are characterised by advanced structural heart disease and pronounced symptoms of heart failure at rest or upon minimal physical exertion, despite maximal medical treatment according to current guidelines.'
expansion = "American Heart Association"
re_exp = re.compile(expansion + "|" + r"(?<=\W)" + expansion + "|"\
                    + expansion.split()[0] + r"[-\s].*?\s*?" + expansion.split()[-1])

m = re_exp.search(text)
print(m.group(0))

但我得到的是

American College of Cardiology (ACC)/American Heart Association

这是最后一个正则表达式的匹配项

如果我删除最后一个正则表达式，或者只调用

re.findall（r）（？Sore.findall（r）（？正则表达式如下所示：
American Heart Association|(?<=\W)American Heart Association|American[-\s].*?\s*?Association


或者，您可以使用一种方法，在第一部分和最后一部分之间不允许使用特定词语：
\bAmerican\b(?:(?!American\b|Association\b).)*\bHeart Association\b

这是您的模式美国心脏协会（？例如，您可以不允许在\bAmerican\b[^/，.]*\bAssociation\b
或类似\bAmerican\b（？：（！American | Association.）的贪婪标记之间匹配指定字符。）*\bHeart Association
谢谢你的精彩解释和建议！我原以为正则表达式会在随后对每个A | B | C
进行整个序列测试。我猜我错了。虽然我不能很好地找出适用于集合中所有句子的正则表达式（必须快速获得数据），我只是简单地用for
循环检查了数据中的每个句子，逐渐放松了正则表达式条件（基本上做了我认为a | B所做的事情）。这允许我覆盖集合中我需要的所有示例，而不会给出上面所示的无效匹配。
\bAmerican\b[^/,.]*\bAssociation\b

\bAmerican\b(?:(?!American\b|Association\b).)*\bHeart Association\b