Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/wordpress/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scrapy 如何使用portia从html中获取“关键字”`_Scrapy_Portia - Fatal编程技术网

Scrapy 如何使用portia从html中获取“关键字”`

Scrapy 如何使用portia从html中获取“关键字”`,scrapy,portia,Scrapy,Portia,现在我想从网页上抓取关键词meta和description,如下所示: <html> <head> <title>test page</title> <meta name="keywords" content="A,B,C"> <meta name="description" content="the description a page"> .... 我昨天在谷歌上搜索过,但不知道,请给我一些建议。你甚至不需要scrap

现在我想从网页上抓取关键词meta和description,如下所示:

<html>
<head>
<title>test page</title>
<meta name="keywords" content="A,B,C">
<meta name="description" content="the description a page">
....

我昨天在谷歌上搜索过,但不知道,请给我一些建议。

你甚至不需要scrapy来做这件事。您只需使用标准库类HTMLParser即可


谢谢,但我有一个portia项目,想在spider中搜索关键词和描述。
#!/usr/bin/python3
try: 
    from html.parser import HTMLParser
except ImportError:
    import HTMLParser

class MyHTMLParse(HTMLParser):
    TAG = "meta"
    NAMES = ['keywords', 'description']
    def __init__(self):
        HTMLParser.__init__(self)
        self.contents = {}
    def handle_starttag(self, tag, attrs):
        if tag == MyHTMLParse.TAG:
            attributes = {i[0] : i[1] for i in attrs}
            if attributes.get("name", None) in MyHTMLParse.NAMES:
                self.contents[attributes["name"]] = attributes["content"]


parser = MyHTMLParse()

# Feed parser the website with parser.feed(), then access the information with
# parser.contents as a dictionary with keys "keywords" and "description"