如何使用re.sub向python中的某些字符串添加标记？_Python_Regex

如何使用re.sub向python中的某些字符串添加标记？

python regex

如何使用re.sub向python中的某些字符串添加标记？,python,regex,Python,Regex,我正在尝试向一些给定的查询字符串添加标记，标记应该环绕所有匹配的字符串。例如，我想在句子我喜欢从我的mac下载iphone游戏。应该是我喜欢从我的mac下载iphone游戏。目前，我试过了 sentence = "I love downloading iPhone games from my mac." query = r'((iphone|games|mac)\s*)+' regex = re.compile(query, re.I) sentence = regex.sub(r'<

我正在尝试向一些给定的查询字符串添加标记，标记应该环绕所有匹配的字符串。例如，我想在句子

我喜欢从我的mac下载iphone游戏。

应该是

我喜欢从我的mac下载iphone游戏。

目前，我试过了

sentence = "I love downloading iPhone games from my mac."
query = r'((iphone|games|mac)\s*)+'
regex = re.compile(query, re.I)
sentence = regex.sub(r'<em>\1</em> ', sentence)

句子=“我喜欢从我的mac电脑下载iPhone游戏。”
query=r'（（iphone | games | mac）\s*）+'
regex=re.compile（查询，re.I）
语句=regex.sub（r'\1'，语句）

句子输出

I love downloading <em>games </em> on my <em>mac</em> !

我喜欢在我的mac电脑上下载游戏！

其中\1仅替换为一个单词（

games

，而不是

iphonegames

），并且单词后面有一些不必要的空格。如何编写正则表达式以获得所需的输出？谢谢

编辑： 我刚刚意识到，当我言之有物时，弗雷德和克里斯的解决方案都有问题。例如，如果我的查询是

game

，那么它将变成

games

，而我不希望它突出显示。另一个例子是

中的

不应突出显示

编辑2:

我采用了Chris的新解决方案，它很有效。

首先，要获得所需的空间，请将

\s*

替换为

\s*？

，使其不贪婪

第一个修正：

>>> re.compile(r'(((iphone|games|mac)\s*?)+)', re.I).sub(r'<em>\1</em>', sentence)
'I love downloading <em>iPhone</em> <em>games</em> from my <em>mac</em>.'

我还不知道怎么解决这个问题

还要注意的是，在这些中，我在+的周围附加了一组括号，这样所有的匹配都会被捕获，这就是区别所在

进一步更新：事实上，我可以想出一个方法来解决这个问题。你自己决定是否要那样

>>> regex = re.compile(r'((iphone|games|mac)(\s*(iphone|games|mac))*)', re.I)
>>> regex.sub(r'<em>\1</em>', sentence)
'I love downloading <em>iPhone games</em> from my <em>mac</em>.'

>>r=re.compile（r'（\s*）（（？：\s*\b（？：iphone | games | mac）\b）+），re.I）
>>>r.sub（r'\1\2'，句子）
我喜欢从我的mac电脑下载iPhone游戏

额外的一组完全包含加号重复，避免丢失单词，同时移动单词前的空格-但最初去掉前导空格-处理该问题。单词边界断言需要对它们之间的3个单词进行完整的单词匹配。然而，NLP很难，仍然会有一些情况下，这并没有达到预期的效果。

没错，没有做到这一点。我想这就是他使用re.compile而不是re.sub的原因——似乎在re.sub中允许

标志只在Python 3中添加了。谢谢！最后一个很完美。Peter，一定要打左边的复选标记，这样它就指定您接受答案。很抱歉，但我刚刚意识到它有一些问题。@Peter:更新为包含使用\b。同样的技术也适用于弗雷德的解决方案。
>>> regex = re.compile(r'((iphone|games|mac)(\s*(iphone|games|mac))*)', re.I)
>>> regex.sub(r'<em>\1</em>', sentence)
'I love downloading <em>iPhone games</em> from my <em>mac</em>.'

>>> regex = re.compile(r'(\b(iphone|games|mac)\b(\s*(iphone|games|mac)\b)*)', re.I)
>>> regex.sub(r'<em>\1</em>', 'I love downloading iPhone games from my mac')
'I love downloading <em>iPhone games</em> from my <em>mac</em>'
>>> regex.sub(r'<em>\1</em>', 'I love downloading iPhone gameses from my mac')
'I love downloading <em>iPhone</em> gameses from my <em>mac</em>'
>>> regex.sub(r'<em>\1</em>', 'I love downloading iPhoney games from my mac')
'I love downloading iPhoney <em>games</em> from my <em>mac</em>'
>>> regex.sub(r'<em>\1</em>', 'I love downloading iPhoney gameses from my mac')
'I love downloading iPhoney gameses from my <em>mac</em>'
>>> regex.sub(r'<em>\1</em>', 'I love downloading miPhone gameses from my mac')
'I love downloading miPhone gameses from my <em>mac</em>'
>>> regex.sub(r'<em>\1</em>', 'I love downloading miPhone games from my mac')
'I love downloading miPhone <em>games</em> from my <em>mac</em>'
>>> regex.sub(r'<em>\1</em>', 'I love downloading iPhone igames from my mac')
'I love downloading <em>iPhone</em> igames from my <em>mac</em>'

>>> r = re.compile(r'(\s*)((?:\s*\b(?:iphone|games|mac)\b)+)', re.I)
>>> r.sub(r'\1<em>\2</em>', sentence)
'I love downloading <em>iPhone games</em> from my <em>mac</em>.'