python解析字符串后的url_Python_Regex_Parsing

python解析字符串后的url

python regex parsing

python解析字符串后的url,python,regex,parsing,Python,Regex,Parsing,我想从url（链接）中提取一个字符串。该字符串位于标记中 link = http://www.test.com/page.html Content of link: <h3>Text here</h3> 链接=http://www.test.com/page.html 链接内容：此处为文本首先获取page.html的内容/源代码，然后删除链接，这是一种优雅的方式吗？谢谢您可以使用URLLib2检索URL的内容：然后，您可以使用Python库中的HTML

我想从url（链接）中提取一个字符串。该字符串位于

标记中

 link = http://www.test.com/page.html

 Content of link: <h3>Text here</h3>

链接=http://www.test.com/page.html
链接内容：此处为文本

首先获取page.html的内容/源代码，然后删除链接，这是一种优雅的方式吗？

谢谢

您可以使用URLLib2检索URL的内容：

然后，您可以使用Python库中的HTML解析器来查找正确的内容：

导入urllib2
url=”http://www.test.com/page.html"
page=urlib2.urlopen（url）
data=page.read（）
对于数据中的项。拆分（“”）：
如果项目中有“”：
打印项目。拆分（“”[1]

如果您想要的文本是页面上唯一的

包装文本，请尝试：

从urllib2导入urlopen

来自重新导入搜索的

text=search（r'（？我推荐。这是一个很好的HTML页面解析器（在大多数情况下，您不必担心页面格式不好）。您应该使用非贪婪限定符，否则它可能会匹配“Heading…”之类的内容。其他Heading'OP的任务只是获取标记，使用regex完全可以。
import urllib2
url="http://www.test.com/page.html"
page=urllib2.urlopen(url)
data=page.read()
for item in data.split("</h3>"):
    if "<h3>" in item:
         print item.split("<h3>")[1]