Scrapy python:unicode链接错误

Scrapy python:unicode链接错误,python,scrapy,Python,Scrapy,链接编码 当抓取网站时,scrapy会提取包含&amd的链接,并抛出Exception: 不要用unicode URL实例化链接对象。假设utf-8编码(可能是错误的),那么我如何修复这个错误呢?我对这个字符有同样的问题→插入到某些链接上。我在github上找到了一个文件link\u extractors.py,其中包含: from scrapy.selector import HtmlXPathSelector from scrapy.contrib.linkextractors.sgml i

链接编码

当抓取网站时,scrapy会提取包含&amd的链接,并抛出Exception:
不要用unicode URL实例化链接对象。假设utf-8编码(可能是错误的),那么我如何修复这个错误呢?

我对这个字符有同样的问题
插入到某些链接上。我在github上找到了一个文件
link\u extractors.py
,其中包含:

from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.utils.response import get_base_url


class CustomLinkExtractor(SgmlLinkExtractor):
"""Need this to fix the encoding error."""

    def extract_links(self, response):
        base_url = None
        if self.restrict_xpaths:
            hxs = HtmlXPathSelector(response)
            base_url = get_base_url(response)
            body = u''.join(f for x in self.restrict_xpaths
                           for f in hxs.select(x).extract())
            try:
                body = body.encode(response.encoding)
            except UnicodeEncodeError:
                body = body.encode('utf-8')
        else:
            body = response.body

        links = self._extract_links(body, response.url, response.encoding, base_url)
        links = self._process_links(links)
        return links
后来我在我的
spider.py中使用了它:

rules = (
    Rule(CustomLinkExtractor(allow=('/gp/offer-listing*', ),
                           restrict_xpaths=("//li[contains(@class,'a-last')]/a", )),
         callback='parse_start_url', follow=True,

         ),
)

任何例子都会很有帮助!