使用Python将HTML中的短语转换为链接_Python_Html_Regex_Replace

使用Python将HTML中的短语转换为链接

python html regex replace

使用Python将HTML中的短语转换为链接,python,html,regex,replace,Python,Html,Regex,Replace,我有一个网站上的内容，某些关键字和关键短语应该链接到其他内容。我不想手动在内容中插入链接。在Python中实现这一点的最佳方法是什么，最好不使用任何DOM库例如，我有以下文本： …这可以通过使用Awesome方法来实现。胡说八道。。。。关键词：真棒的方法这是所需的输出： …这可以通过。胡说八道。。。。我有一个这样的关键短语和相应的网址列表。这些短语在任何情况下都可以出现在内容中，但在关键字短语定义中都是小写的目前，我正在使用字符串find replace替换大小写变化的单词组合。

我有一个网站上的内容，某些关键字和关键短语应该链接到其他内容。我不想手动在内容中插入链接。在Python中实现这一点的最佳方法是什么，最好不使用任何DOM库

例如，我有以下文本：

…这可以通过使用Awesome方法来实现。胡说八道。。。。

关键词：真棒的方法

这是所需的输出：

…这可以通过。胡说八道。。。。

我有一个这样的关键短语和相应的网址列表。这些短语在任何情况下都可以出现在内容中，但在关键字短语定义中都是小写的

目前，我正在使用字符串find replace替换大小写变化的单词组合。而且效率很低。

像这样的东西怎么样

for keyphrase, url in links:
    content = re.sub('(%s)' % keyphrase, r'<a href="%s">\1</a>' % url, content, flags=re.IGNORECASE)

对于关键字短语，链接中的url:
content=re.sub（“（%s）”%keyphase，r'%url，content，flags=re.IGNORECASE）

例如，在你的例子中，你可以

import re

content = "...And this can be accomplished with the Awesome Method. Blah blah blah...."
links = [('awesome method', '/path/to/awesome-method')]

for keyphrase, url in links:
    content = re.sub('(%s)' % keyphrase, r'<a href="%s">\1</a>' % url, content, flags=re.IGNORECASE)

# content:
# '...And this can be accomplished with the <a href="/path/to/awesome-method">Awesome Method</a>. Blah blah blah....'

重新导入
content=“…这可以通过令人敬畏的方法实现。诸如此类……”
links=[（'awesome'，'/path/to/awesome method'）]
对于关键字短语，链接中的url：
content=re.sub（“（%s）”%keyphase，r'%url，content，flags=re.IGNORECASE）
#内容：
#“……这可以通过以下方式实现。诸如此类……'

您可以迭代文本中的位置，并使用基本字符串操作生成新文本：

import re

text = """And this can be accomplished with the Awesome Method. Blah blah blah"""

keyphrases = [
    ('awesome method', 'http://awesome.com/'),
    ('blah', 'http://blah.com')
  ] 

new_parts = []

pos = 0
while pos < len(text):
  replaced = False
  for phrase, url in keyphrases:
    substring = text[pos:pos+len(phrase)]
    if substring.lower() == phrase.lower():
      new_parts.append('<a href="%s">%s</a>' % (url, substring))
      pos += len(substring)
      replaced = True
      break
  if not replaced:
    new_parts.append(text[pos])
    pos += 1

new_text = ''.join(new_parts)
print(new_text)

重新导入
text=“”这可以通过令人敬畏的方法实现。诸如此类
关键词=[
（‘可怕的方法’，'http://awesome.com/'),
（‘废话’，’http://blah.com')
] 
新零件=[]
pos=0
而pos

查看模块-将文本转换为超文本
import anchorman

text = "...And this can be accomplished with the Awesome Method. Blah blah blah...."
links = [{'awesome method': {'value': '/path/to/awesome-method'}}]
markup_format = {
    'tag': 'a',
    'value_key': 'href',
    'attributes': [
        ('class', 'anups')
    ],
    'case_sensitive': False
}

a = anchorman.add(text, links, markup_format=markup_format)
print a
...And this can be accomplished with the <a href="/path/to/awesome-method"
class="anups">Awesome Method</a>. Blah blah blah....

导入主持人
text=“…这可以通过令人敬畏的方法实现。诸如此类……”
links=[{'awesome'：{'value'：'/path/to/awesome-method'}]
标记格式={
“tag”：“a”，
'value_key'：'href'，
“属性”：[
（“类”、“anups”）
],
“区分大小写”：False
}
a=主播.add（文本、链接、标记格式=标记格式）
打印
…这可以通过以下方式实现。胡说八道。。。。
取决于内容的大小以及要匹配的短语的数量和大小。如果你能举一个有代表性的例子，你可能会得到更有用的答案。我一直在使用一种类似于@michael laszlo建议的方法。这看起来干净多了。我只需要调整它以匹配完整的短语，这样它就不会替换部分单词/短语，如blaawesome