TypeError:无法创建对';str';python中SCRAPY中的对象

TypeError:无法创建对';str';python中SCRAPY中的对象,python,xpath,scrapy,typeerror,Python,Xpath,Scrapy,Typeerror,我使用python中的scrapy编写了以下spider,如下所示: #!/usr/bin/python from twisted.internet import reactor import scrapy from scrapy.crawler import CrawlerRunner from scrapy.utils.log import configure_logging from scrapy.selector import Selector class GivenSpider(s

我使用python中的scrapy编写了以下spider,如下所示:

#!/usr/bin/python 
from twisted.internet import reactor
import scrapy
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from scrapy.selector import Selector

class GivenSpider(scrapy.Spider):
    name = "dmoz"
    allowed_domains = ["dmoz.org"]
    start_urls = [
        "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
        "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
    ]

    def parse(self, response):
        select = Selector(response.body)
        title = select.xpath("//a[@class=listinglink]/@href").extract()
        print title
#       for t in title:
#           title4 = MyItem()
#           title4['content'] = t
#           yield title4

#       filename = response.url.split("/")[-2] + '.html'
#       with open(filename, 'wb') as f:
#           f.write(response.body)

configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'})
runner = CrawlerRunner()

d = runner.crawl(GivenSpider)
d.addBoth(lambda _: reactor.stop())
reactor.run()
我正在运行它:

$ python runTimeSpider.py
我给出的以下输出是:

INFO: Enabled extensions: CloseSpider, TelnetConsole, LogStats, CoreStats, SpiderState
INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
INFO: Enabled item pipelines: 
INFO: Spider opened
INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
DEBUG: Telnet console listening on 127.0.0.1:6023
DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> (referer: None)
DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/> (referer: None)
ERROR: Spider error processing <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> (referer: None)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 588, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "runTimeSpider.py", line 17, in parse
    select = Selector(str(response.body))
  File "/usr/local/lib/python2.7/dist-packages/scrapy/selector/unified.py", line 80, in __init__
    _root = LxmlDocument(response, self._parser)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/selector/lxmldocument.py", line 24, in __new__
    cache = cls.cache.setdefault(response, {})
  File "/usr/lib/python2.7/weakref.py", line 433, in setdefault
    return self.data.setdefault(ref(key, self._remove),default)
TypeError: cannot create weak reference to 'str' object
ERROR: Spider error processing <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/> (referer: None)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 588, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "runTimeSpider.py", line 17, in parse
    select = Selector(str(response.body))
  File "/usr/local/lib/python2.7/dist-packages/scrapy/selector/unified.py", line 80, in __init__
    _root = LxmlDocument(response, self._parser)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/selector/lxmldocument.py", line 24, in __new__
    cache = cls.cache.setdefault(response, {})
  File "/usr/lib/python2.7/weakref.py", line 433, in setdefault
    return self.data.setdefault(ref(key, self._remove),default)
TypeError: cannot create weak reference to 'str' object
INFO: Closing spider (finished)
INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 514,
 'downloader/request_count': 2,
 'downloader/request_method_count/GET': 2,
 'downloader/response_bytes': 16284,
 'downloader/response_count': 2,
 'downloader/response_status_count/200': 2,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2016, 1, 21, 8, 28, 26, 17960),
 'log_count/DEBUG': 3,
 'log_count/ERROR': 2,
 'log_count/INFO': 7,
 'response_received_count': 2,
 'scheduler/dequeued': 2,
 'scheduler/dequeued/memory': 2,
 'scheduler/enqueued': 2,
 'scheduler/enqueued/memory': 2,
 'spider_exceptions/TypeError': 2,
 'start_time': datetime.datetime(2016, 1, 21, 8, 28, 24, 986319)}
INFO: Spider closed (finished)

原因是您希望将
response.body
转换为选择器
response.body
是一个字符串——在字符串上不能执行XPath查询

所以要么使用

select = Selector(response)
或者直接在
响应
对象上调用XPath查询,因为它是一个包含
XPath
作为方法的对象:

title = response.xpath("//a[@class=listinglink]/@href").extract()
title = response.xpath("//a[@class=listinglink]/@href").extract()