Python 用文本替换HTML链接_Python_Html_Parsing_Text Parsing

Python 用文本替换HTML链接

python html parsing

Python 用文本替换HTML链接,python,html,parsing,text-parsing,Python,Html,Parsing,Text Parsing,如何在html（python）中用锚点替换链接例如输入： <p> Hello <a href="http://example.com">link text1</a> and <a href="http://example.com">link text2</a> ! </p> 大家好我想要保存p标记的结果（仅删除标记）：你好，链接文本1和链接文本2！看起来是BeautifulSoup方法的完美案例：从bs4

如何在html（python）中用锚点替换链接

例如输入：

 <p> Hello <a href="http://example.com">link text1</a> and <a href="http://example.com">link text2</a> ! </p>

大家好

我想要保存p标记的结果（仅删除标记）：


你好，链接文本1和链接文本2！

看起来是BeautifulSoup方法的完美案例：

从bs4导入美化组
data=''大家好 '''
汤=美汤（数据）
p_tag=soup.find（'p'）
对于p_标记中的u。查找所有（'a'）：
p_标签a.展开（）
打印p_标签

这使得：

<p> Hello link text1 and link text2 ! </p>

你好，链接文本1和链接文本2

您可以通过一个简单的正则表达式和

子函数来实现这一点：
import re

text = '<p> Hello <a href="http://example.com">link text1</a> and <a href="http://example.com">link text2</a> ! </p>'
pattern =r'<(a|/a).*?>'

result = re.sub(pattern , "", text)

print result
'<p> Hello link text1 and link text2 ! </p>'

重新导入
text='大家好 "
图案=r“
结果=re.sub（模式，“，文本）
打印结果
“你好，link text1和link text2 "

此代码用空字符串替换所有出现的
和
标记。
您可以使用解析器库。。也像美如素等。我不确定，但你可以得到一些东西
我不知道答案，但我猜它涉及到beautifulsou:-）@mgilson，一个简单的正则表达式不能解决非嵌套锚的情况，对吗？你可以使用作为in p_标记。find_all（'a'）：a.unwrap（）@flyingfoxlee是的，意识到我可以使用循环：）更新！
<p> Hello link text1 and link text2 ! </p>

import re

text = '<p> Hello <a href="http://example.com">link text1</a> and <a href="http://example.com">link text2</a> ! </p>'
pattern =r'<(a|/a).*?>'

result = re.sub(pattern , "", text)

print result
'<p> Hello link text1 and link text2 ! </p>'