Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/url/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何使用python从URL提取元描述?_Python_Url_Extract_Meta Tags_Goose - Fatal编程技术网

如何使用python从URL提取元描述?

如何使用python从URL提取元描述?,python,url,extract,meta-tags,goose,Python,Url,Extract,Meta Tags,Goose,我想从以下网站摘录标题和描述: 查看来源: 使用以下源代码片段: <title>Book a Virgin Australia Flight | Virgin Australia </title> <meta name="keywords" content="" /> <meta name="description" content="Search for and book Virgin Australia and partner

我想从以下网站摘录标题和描述:

查看来源:

使用以下源代码片段:

<title>Book a Virgin Australia Flight | Virgin Australia
</title>
    <meta name="keywords" content="" />
        <meta name="description" content="Search for and book Virgin Australia and partner flights to Australian and international destinations." />

结果为空

请检查解决方案

对于上述问题,您可以使用以下代码提取“描述”信息:

输出:

['Search for and book Virgin Australia and partner flights to Australian and international destinations.']
你知道html xpath吗? 使用带有xpath的lxml库提取html元素是一种快速的方法

import lxml

doc = lxml.html.document_fromstring(html_content)
title_element = doc.xpath("//title")
website_title = title_element[0].text_content().strip()
meta_description_element = doc.xpath("//meta[@property='description']")
website_meta_description = meta_description_element[0].text_content().strip()
导入元数据语法分析器

page=metadata\u parser.MetadataParser(url='www.xyz.com') metaDesc=page.metadata['og']['description']
打印(metaDesc)

您可以使用BeautifulSoup来实现这一点

应该会有帮助-

metas = soup.find_all('meta') #Get Meta Description
for m in metas:
    if m.get ('name') == 'description':
        desc = m.get('content')
        print(desc)
        

那美丽的乌苏呢您可能希望添加对meta.attrs中存在的内容的检查,因为格式错误的html可能会导致引发异常:如果meta.attrs中的“name”和meta.attrs中的“content”以及meta.attrs['name']=='description'],则metas中的meta可能需要添加()虽然这段代码可以解决这个问题,但如何以及为什么解决这个问题将真正有助于提高您的帖子质量,并可能导致更多的投票。请记住,你是在将来回答读者的问题,而不仅仅是现在提问的人。请在回答中添加解释,并说明适用的限制和假设。
['Search for and book Virgin Australia and partner flights to Australian and international destinations.']
import lxml

doc = lxml.html.document_fromstring(html_content)
title_element = doc.xpath("//title")
website_title = title_element[0].text_content().strip()
meta_description_element = doc.xpath("//meta[@property='description']")
website_meta_description = meta_description_element[0].text_content().strip()
metas = soup.find_all('meta') #Get Meta Description
for m in metas:
    if m.get ('name') == 'description':
        desc = m.get('content')
        print(desc)