Python 使用动态正则表达式匹配字符串中的整个单词_Python_Regex

Python 使用动态正则表达式匹配字符串中的整个单词

python regex

Python 使用动态正则表达式匹配字符串中的整个单词,python,regex,Python,Regex,我想看看一个单词是否出现在使用正则表达式的句子中。单词之间用空格隔开，但两边可能有标点符号。如果这个词在字符串的中间，下面的匹配工作（它防止部分词匹配，允许在单词的每一个边上加标点）。但是，这与第一个或最后一个单词不匹配，因为没有尾随/前导空格。因此，对于这些情况，我也一直在使用： match_starting_word = "^[^a-zA-Z\d]{0,}" + word + "[^a-zA-Z\d ]{0,} " match_end_word = " [^a-zA-Z\d ]{0,}"

我想看看一个单词是否出现在使用正则表达式的句子中。单词之间用空格隔开，但两边可能有标点符号。如果这个词在字符串的中间，下面的匹配工作（它防止部分词匹配，允许在单词的每一个边上加标点）。但是，这与第一个或最后一个单词不匹配，因为没有尾随/前导空格。因此，对于这些情况，我也一直在使用：

match_starting_word = "^[^a-zA-Z\d]{0,}" + word + "[^a-zA-Z\d ]{0,} "
match_end_word = " [^a-zA-Z\d ]{0,}" + word + "[^a-zA-Z\d]{0,}$"

然后结合

 match_string = match_middle_words  + "|" + match_starting_word  +"|" + match_end_word

有没有一种简单的方法可以避免三个匹配条件的需要。具体来说，是否有一种方法可以指定“空间或文件的开头（即“^”）和类似内容”，或者指定空间或文件的结尾（即“$”）？

为什么不使用单词边界

如果您有一个单词列表（例如，在

单词变量中）要作为一个完整单词进行匹配，请使用
match_string = r'\b(?:{})\b'.format('|'.join(words))
match_string = rf'\b(?:{"|".join(words)})\b'         # Python 3.7+ required

在本例中，您将确保仅当单词被非单词字符包围时才捕获该单词。还要注意，\b
在字符串的开头和结尾匹配。因此，添加3个备选方案是没有用的
:
我们找到了3个匹配项：
['word', 'word', 'word']

关于“单词”边界的注释
当“单词”实际上是任何字符的块时，在传递到正则表达式模式之前，您应该re.escape
它们：
match_string = r'\b{}\b'.format(re.escape(word)) # a single escaped "word" string passed
match_string = r'\b(?:{})\b'.format("|".join(map(re.escape, words))) # words list is escaped
match_string = rf'\b(?:{"|".join(map(re.escape, words))})\b' # Same as above for Python 3.7+

如果要作为整体匹配的单词可能以特殊字符开头/结尾，\b
，请使用明确的单词边界：
match_string=r'（？你能举一些例子吗？谢谢-这是比我预期的简单得多的解决方案！仅供参考：如果要搜索的字数超过十万，那么构建正则表达式trie是有意义的，如中所述。
import re
strn = "word hereword word, there word"
search = "word"
print re.findall(r"\b" + search + r"\b", strn)

['word', 'word', 'word']

match_string = r'\b{}\b'.format(re.escape(word)) # a single escaped "word" string passed
match_string = r'\b(?:{})\b'.format("|".join(map(re.escape, words))) # words list is escaped
match_string = rf'\b(?:{"|".join(map(re.escape, words))})\b' # Same as above for Python 3.7+

match_string = r'(?<!\w){}(?!\w)'.format(re.escape(word))
match_string = r'(?<!\w)(?:{})(?!\w)'.format("|".join(map(re.escape, words))) 

match_string = r'(?<!\S){}(?!\S)'.format(word)
match_string = r'(?<!\S)(?:{})(?!\S)'.format("|".join(map(re.escape, words)))