如何使用python从URL提取元描述？_Python_Url_Extract_Meta Tags_Goose

如何使用python从URL提取元描述？

python url

如何使用python从URL提取元描述？,python,url,extract,meta-tags,goose,Python,Url,Extract,Meta Tags,Goose,我想从以下网站摘录标题和描述：查看来源：使用以下源代码片段： <title>Book a Virgin Australia Flight | Virgin Australia </title> <meta name="keywords" content="" /> <meta name="description" content="Search for and book Virgin Australia and partner

我想从以下网站摘录标题和描述：

查看来源：

使用以下源代码片段：

<title>Book a Virgin Australia Flight | Virgin Australia
</title>
    <meta name="keywords" content="" />
        <meta name="description" content="Search for and book Virgin Australia and partner flights to Australian and international destinations." />

及

结果为空

请检查解决方案

对于上述问题，您可以使用以下代码提取“描述”信息：

输出：

['Search for and book Virgin Australia and partner flights to Australian and international destinations.']

你知道html xpath吗？使用带有xpath的lxml库提取html元素是一种快速的方法

import lxml

doc = lxml.html.document_fromstring(html_content)
title_element = doc.xpath("//title")
website_title = title_element[0].text_content().strip()
meta_description_element = doc.xpath("//meta[@property='description']")
website_meta_description = meta_description_element[0].text_content().strip()

导入元数据语法分析器

page=metadata\u parser.MetadataParser（url='www.xyz.com'） metaDesc=page.metadata['og']['description']

打印（metaDesc）

您可以使用BeautifulSoup来实现这一点

应该会有帮助-

metas = soup.find_all('meta') #Get Meta Description
for m in metas:
    if m.get ('name') == 'description':
        desc = m.get('content')
        print(desc)

那美丽的乌苏呢您可能希望添加对meta.attrs中存在的内容的检查，因为格式错误的html可能会导致引发异常：如果meta.attrs中的“name”和meta.attrs中的“content”以及meta.attrs['name']=='description']，则metas中的meta可能需要添加（）虽然这段代码可以解决这个问题，但如何以及为什么解决这个问题将真正有助于提高您的帖子质量，并可能导致更多的投票。请记住，你是在将来回答读者的问题，而不仅仅是现在提问的人。请在回答中添加解释，并说明适用的限制和假设。

['Search for and book Virgin Australia and partner flights to Australian and international destinations.']

import lxml

doc = lxml.html.document_fromstring(html_content)
title_element = doc.xpath("//title")
website_title = title_element[0].text_content().strip()
meta_description_element = doc.xpath("//meta[@property='description']")
website_meta_description = meta_description_element[0].text_content().strip()

metas = soup.find_all('meta') #Get Meta Description
for m in metas:
    if m.get ('name') == 'description':
        desc = m.get('content')
        print(desc)