Python 3.x scrapy:处理url中的特殊字符

Python 3.x scrapy:处理url中的特殊字符,python-3.x,web-scraping,scrapy,Python 3.x,Web Scraping,Scrapy,我正在抓取一个包含特殊字符的XML站点地图,如é,这会导致 ERROR: Spider error processing <GET [URL with '%C3%A9' instead of 'é']> 我还尝试了我自己的请求子类,但没有safe\u url\u string,但结果是: UnicodeEncodeError: 'ascii' codec can't encode character '\xf9' in position 25: ordinal not in ran

我正在抓取一个包含特殊字符的XML站点地图,如é,这会导致

ERROR: Spider error processing <GET [URL with '%C3%A9' instead of 'é']>
我还尝试了我自己的请求子类,但没有
safe\u url\u string
,但结果是:

UnicodeEncodeError: 'ascii' codec can't encode character '\xf9' in position 25: ordinal not in range(128)
完全回溯:

[scrapy.core.scraper] ERROR: Error downloading <GET [URL with characters like ù]>
Traceback (most recent call last):
  File "/usr/share/anaconda3/lib/python3.5/site-packages/twisted/internet/defer.py", line 1384, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
  File "/usr/share/anaconda3/lib/python3.5/site-packages/twisted/python/failure.py", line 393, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
  File "/usr/share/anaconda3/lib/python3.5/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))
  File "/usr/share/anaconda3/lib/python3.5/site-packages/scrapy/utils/defer.py", line 45, in mustbe_deferred
result = f(*args, **kw)
  File "/usr/share/anaconda3/lib/python3.5/site-packages/scrapy/core/downloader/handlers/__init__.py", line 65, in download_request
return handler.download_request(request, spider)
  File "/usr/share/anaconda3/lib/python3.5/site-packages/scrapy/core/downloader/handlers/http11.py", line 61, in download_request
return agent.download_request(request)
  File "/usr/share/anaconda3/lib/python3.5/site-packages/scrapy/core/downloader/handlers/http11.py", line 260, in download_request
agent = self._get_agent(request, timeout)
  File "/usr/share/anaconda3/lib/python3.5/site-packages/scrapy/core/downloader/handlers/http11.py", line 241, in _get_agent
scheme = _parse(request.url)[0]
  File "/usr/share/anaconda3/lib/python3.5/site-packages/scrapy/core/downloader/webclient.py", line 37, in _parse
return _parsed_url_args(parsed)
  File "/usr/share/anaconda3/lib/python3.5/site-packages/scrapy/core/downloader/webclient.py", line 19, in _parsed_url_args
path = b(path)
  File "/usr/share/anaconda3/lib/python3.5/site-packages/scrapy/core/downloader/webclient.py", line 17, in <lambda>
b = lambda s: to_bytes(s, encoding='ascii')
  File "/usr/share/anaconda3/lib/python3.5/site-packages/scrapy/utils/python.py", line 120, in to_bytes
return text.encode(encoding, errors)
UnicodeEncodeError: 'ascii' codec can't encode character '\xf9' in position 25: ordinal not in range(128)
[scrapy.core.scraper]错误:下载错误
回溯(最近一次呼叫最后一次):
文件“/usr/share/anaconda3/lib/python3.5/site packages/twisted/internet/defer.py”,第1384行,在内联回调中
结果=结果。通过ExceptionToGenerator(g)
文件“/usr/share/anaconda3/lib/python3.5/site packages/twisted/python/failure.py”,第393行,在ThroweExceptionToGenerator中
返回g.throw(self.type、self.value、self.tb)
文件“/usr/share/anaconda3/lib/python3.5/site packages/scrapy/core/downloader/middleware.py”,第43行,进程中请求
defer.returnValue((yield download_func(request=request,spider=spider)))
文件“/usr/share/anaconda3/lib/python3.5/site packages/scrapy/utils/defer.py”,第45行,必须延迟
结果=f(*参数,**kw)
下载请求中的第65行文件“/usr/share/anaconda3/lib/python3.5/site packages/scrapy/core/downloader/handlers/_init__.py”
返回处理程序。下载\u请求(请求,spider)
下载请求中的文件“/usr/share/anaconda3/lib/python3.5/site packages/scrapy/core/downloader/handlers/http11.py”,第61行
返回代理。下载请求(请求)
下载请求中的文件“/usr/share/anaconda3/lib/python3.5/site packages/scrapy/core/downloader/handlers/http11.py”,第260行
代理=self.\u获取\u代理(请求,超时)
文件“/usr/share/anaconda3/lib/python3.5/site packages/scrapy/core/downloader/handlers/http11.py”,第241行,在get代理中
scheme=_parse(request.url)[0]
文件“/usr/share/anaconda3/lib/python3.5/site packages/scrapy/core/downloader/webclient.py”,第37行,在
返回已解析的url参数(已解析)
文件“/usr/share/anaconda3/lib/python3.5/site packages/scrapy/core/downloader/webclient.py”,第19行,在解析url参数中
路径=b(路径)
文件“/usr/share/anaconda3/lib/python3.5/site packages/scrapy/core/downloader/webclient.py”,第17行,在
b=lambda s:to_字节(s,encoding='ascii')
文件“/usr/share/anaconda3/lib/python3.5/site packages/scrapy/utils/python.py”,第120行,以字节为单位
返回text.encode(编码,错误)
UnicodeEncodeError:“ascii”编解码器无法对位置25中的字符“\xf9”进行编码:序号不在范围内(128)

有什么提示吗?

我认为在存储
请求
的url之前,您不能从
w3lib
库中以
安全url\u字符串
的形式来执行此操作。您可能不得不以某种方式扭转这种局面。

您可以在URL之前使用“r”字母:
url=r'name of the url'

请查看我的解决类似问题的方法。也许你可以将这种技术应用到你的用例中。真正的问题在于:答案是:
[scrapy.core.scraper] ERROR: Error downloading <GET [URL with characters like ù]>
Traceback (most recent call last):
  File "/usr/share/anaconda3/lib/python3.5/site-packages/twisted/internet/defer.py", line 1384, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
  File "/usr/share/anaconda3/lib/python3.5/site-packages/twisted/python/failure.py", line 393, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
  File "/usr/share/anaconda3/lib/python3.5/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))
  File "/usr/share/anaconda3/lib/python3.5/site-packages/scrapy/utils/defer.py", line 45, in mustbe_deferred
result = f(*args, **kw)
  File "/usr/share/anaconda3/lib/python3.5/site-packages/scrapy/core/downloader/handlers/__init__.py", line 65, in download_request
return handler.download_request(request, spider)
  File "/usr/share/anaconda3/lib/python3.5/site-packages/scrapy/core/downloader/handlers/http11.py", line 61, in download_request
return agent.download_request(request)
  File "/usr/share/anaconda3/lib/python3.5/site-packages/scrapy/core/downloader/handlers/http11.py", line 260, in download_request
agent = self._get_agent(request, timeout)
  File "/usr/share/anaconda3/lib/python3.5/site-packages/scrapy/core/downloader/handlers/http11.py", line 241, in _get_agent
scheme = _parse(request.url)[0]
  File "/usr/share/anaconda3/lib/python3.5/site-packages/scrapy/core/downloader/webclient.py", line 37, in _parse
return _parsed_url_args(parsed)
  File "/usr/share/anaconda3/lib/python3.5/site-packages/scrapy/core/downloader/webclient.py", line 19, in _parsed_url_args
path = b(path)
  File "/usr/share/anaconda3/lib/python3.5/site-packages/scrapy/core/downloader/webclient.py", line 17, in <lambda>
b = lambda s: to_bytes(s, encoding='ascii')
  File "/usr/share/anaconda3/lib/python3.5/site-packages/scrapy/utils/python.py", line 120, in to_bytes
return text.encode(encoding, errors)
UnicodeEncodeError: 'ascii' codec can't encode character '\xf9' in position 25: ordinal not in range(128)