Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/296.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在scrapy downloader中间件中获取响应体_Python_Scrapy_Web Crawler_Scrapy Spider - Fatal编程技术网

Python 如何在scrapy downloader中间件中获取响应体

Python 如何在scrapy downloader中间件中获取响应体,python,scrapy,web-crawler,scrapy-spider,Python,Scrapy,Web Crawler,Scrapy Spider,如果在页面上找不到某些XPath,我需要能够重试该请求。所以我写了这个中间件: class ManualRetryMiddleware(RetryMiddleware): def process_response(self, request, response, spider): if not spider.retry_if_not_found: return response if not hasattr(response, 't

如果在页面上找不到某些XPath,我需要能够重试该请求。所以我写了这个中间件:

class ManualRetryMiddleware(RetryMiddleware):
    def process_response(self, request, response, spider):
        if not spider.retry_if_not_found:
            return response
        if not hasattr(response, 'text') and response.status != 200:
            return super(ManualRetryMiddleware, self).process_response(request, response, spider)
        found = False
        for xpath in spider.retry_if_not_found:
            if response.xpath(xpath).extract():
                found = True
                break
        if not found:
            return self._retry(request, "Didn't find anything useful", spider)
        return response
并在
settings.py中注册它:

DOWNLOADER_MIDDLEWARES = {
    'myproject.middlewares.ManualRetryMiddleware': 650,
    'scrapy.downloadermiddlewares.retry.RetryMiddleware': None,
}
当我运行蜘蛛时,我会

AttributeError: 'Response' object has no attribute 'xpath'
我试图手动创建选择器并在其上运行xpath。。。但是响应没有
text
属性,并且
response.body
是字节,而不是字符串


那么如何在中间件中检查页面内容呢?有可能某些页面不包含我需要的详细信息,因此我希望能够稍后再试。

下载器中间件的
response
方法不包含
xpath
方法的原因是
response
中的
process\u response
参数为类型,请参阅。只有(和)有
xpath
方法。因此,在使用
xpath
之前,从
response
创建
HtmlResponse
对象。课程的相应部分将成为:

...
new_response = scrapy.http.HtmlResponse(response.url, body=response.body)
if new_response.xpath(xpath).extract():
    found = True
    break
...

还要注意你的位置。它必须在
scrapy.downloadermiddleware.httpcompression.HttpCompressionMiddleware
之前,否则,您可能会尝试解码压缩数据(这确实不起作用)。检查response.header以了解响应是否被压缩-
内容编码:gzip

新的\u响应中有一些乱码。但是现在,文本
指定了scrapy.http.HtmlResponse(response.url,body=response.body,Encoding=“utf-8”)
没有帮助。检查@mouch的答案。确保您没有使用压缩响应!