Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/333.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Scrapy-声明了非ascii字符,但未声明编码_Python_Encoding_Scrapy - Fatal编程技术网

Python Scrapy-声明了非ascii字符,但未声明编码

Python Scrapy-声明了非ascii字符,但未声明编码,python,encoding,scrapy,Python,Encoding,Scrapy,我正试图从这个网站上搜集一些基本数据,以了解更多关于scrapy的信息,并作为一个大学项目的概念证明: 当我使用scrapy shell时,我能够使用以下XPath获得我想要的信息: sel.xpath(‘//tbody/tr[1]/td[2]/a/text()’).extract() 它应该返回表的第一行的游戏标题,在结构中: <tbody> <tr> <td></td> <td>

我正试图从这个网站上搜集一些基本数据,以了解更多关于scrapy的信息,并作为一个大学项目的概念证明:

当我使用scrapy shell时,我能够使用以下XPath获得我想要的信息:

sel.xpath(‘//tbody/tr[1]/td[2]/a/text()’).extract()
它应该返回表的第一行的游戏标题,在结构中:

<tbody>
     <tr>
          <td></td>
          <td><a>stuff I want here</a></td>
...
我得到以下错误:

    ricks-mbp:steam_crawler someuser$ scrapy crawl steam -o items.csv -t csv
Traceback (most recent call last):
  File "/usr/local/bin/scrapy", line 5, in <module>
    pkg_resources.run_script('Scrapy==0.20.0', 'scrapy')
  File "build/bdist.macosx-10.9-intel/egg/pkg_resources.py", line 492, in run_script

  File "build/bdist.macosx-10.9-intel/egg/pkg_resources.py", line 1350, in run_script
    for name in eagers:
  File "/Library/Python/2.7/site-packages/Scrapy-0.20.0-py2.7.egg/EGG-INFO/scripts/scrapy", line 4, in <module>
    execute()
  File "/Library/Python/2.7/site-packages/Scrapy-0.20.0-py2.7.egg/scrapy/cmdline.py", line 143, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/Library/Python/2.7/site-packages/Scrapy-0.20.0-py2.7.egg/scrapy/cmdline.py", line 89, in _run_print_help
    func(*a, **kw)
  File "/Library/Python/2.7/site-packages/Scrapy-0.20.0-py2.7.egg/scrapy/cmdline.py", line 150, in _run_command
    cmd.run(args, opts)
  File "/Library/Python/2.7/site-packages/Scrapy-0.20.0-py2.7.egg/scrapy/commands/crawl.py", line 47, in run
    crawler = self.crawler_process.create_crawler()
  File "/Library/Python/2.7/site-packages/Scrapy-0.20.0-py2.7.egg/scrapy/crawler.py", line 87, in create_crawler
    self.crawlers[name] = Crawler(self.settings)
  File "/Library/Python/2.7/site-packages/Scrapy-0.20.0-py2.7.egg/scrapy/crawler.py", line 25, in __init__
    self.spiders = spman_cls.from_crawler(self)
  File "/Library/Python/2.7/site-packages/Scrapy-0.20.0-py2.7.egg/scrapy/spidermanager.py", line 35, in from_crawler
    sm = cls.from_settings(crawler.settings)
  File "/Library/Python/2.7/site-packages/Scrapy-0.20.0-py2.7.egg/scrapy/spidermanager.py", line 31, in from_settings
    return cls(settings.getlist('SPIDER_MODULES'))
  File "/Library/Python/2.7/site-packages/Scrapy-0.20.0-py2.7.egg/scrapy/spidermanager.py", line 22, in __init__
    for module in walk_modules(name):
  File "/Library/Python/2.7/site-packages/Scrapy-0.20.0-py2.7.egg/scrapy/utils/misc.py", line 68, in walk_modules
    submod = import_module(fullpath)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/importlib/__init__.py", line 37, in import_module
    __import__(name)
  File "/xxx/scrape/steam/steam_crawler/spiders/steam.py", line 18
SyntaxError: Non-ASCII character '\xe2' in file /xxx/scrape/steam/steam_crawler/spiders/steam.py on line 18, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
ricks mbp:steam\u crawler someuser$scrapy crawl steam-o items.csv-t csv
回溯(最近一次呼叫最后一次):
文件“/usr/local/bin/scrapy”,第5行,在
pkg_resources.run_脚本('Scrapy==0.20.0','Scrapy')
文件“build/bdist.macosx-10.9-intel/egg/pkg_resources.py”,第492行,在run_脚本中
文件“build/bdist.macosx-10.9-intel/egg/pkg_resources.py”,第1350行,在run_脚本中
对于渴望中的名字:
文件“/Library/Python/2.7/site packages/Scrapy-0.20.0-py2.7.egg/egg-INFO/scripts/Scrapy”,第4行,在
执行()
文件“/Library/Python/2.7/site packages/Scrapy-0.20.0-py2.7.egg/Scrapy/cmdline.py”,执行中的第143行
_运行\u打印\u帮助(解析器、\u运行\u命令、cmd、args、opts)
文件“/Library/Python/2.7/site packages/Scrapy-0.20.0-py2.7.egg/Scrapy/cmdline.py”,第89行,在运行打印帮助中
func(*a,**千瓦)
文件“/Library/Python/2.7/site packages/Scrapy-0.20.0-py2.7.egg/Scrapy/cmdline.py”,第150行,在_run_命令中
cmd.run(参数、选项)
文件“/Library/Python/2.7/site packages/Scrapy-0.20.0-py2.7.egg/Scrapy/commands/crawl.py”,第47行,运行中
crawler=self.crawler\u进程.create\u crawler()
文件“/Library/Python/2.7/site packages/Scrapy-0.20.0-py2.7.egg/Scrapy/crawler.py”,第87行,在create_crawler中
self.Crawler[name]=爬虫程序(self.settings)
文件“/Library/Python/2.7/site packages/Scrapy-0.20.0-py2.7.egg/Scrapy/crawler.py”,第25行,在__
self.spider=spman\u cls.来自爬虫(self)
文件“/Library/Python/2.7/site packages/Scrapy-0.20.0-py2.7.egg/Scrapy/spidermanager.py”,第35行,在from_crawler中
sm=cls.from_设置(爬虫程序设置)
文件“/Library/Python/2.7/site packages/Scrapy-0.20.0-py2.7.egg/Scrapy/spidermanager.py”,第31行,在from_设置中
返回cls(settings.getlist('SPIDER_MODULES'))
文件“/Library/Python/2.7/site packages/Scrapy-0.20.0-py2.7.egg/Scrapy/spidermanager.py”,第22行,在__
对于walk_模块中的模块(名称):
文件“/Library/Python/2.7/site packages/Scrapy-0.20.0-py2.7.egg/Scrapy/utils/misc.py”,第68行,在walk_模块中
子模块=导入模块(完整路径)
文件“/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/importlib/_init__.py”,第37行,在导入模块中
__导入(名称)
文件“/xxx/scrap/steam/steam\u crawler/spider/steam.py”,第18行
SyntaxError:第18行的/xxx/scrap/steam/steam\u crawler/spider/steam.py文件中的非ASCII字符'\xe2',但未声明编码;看见http://www.python.org/peps/pep-0263.html 详情
我有一种感觉,我所需要做的就是告诉scrapy,字符将遵循utf-8而不是ascii,因为有字符等。但从我所能收集的信息来看,它应该从页面的头部收集这些信息,在本网站的情况下,它是:

<meta charset="utf-8">


这让我很困惑!任何不是我也感兴趣的scrapy文档本身的见解/阅读

看起来您使用的是
而不是双引号

顺便说一句,在所有表行上循环的更好实践如下:

for tr in sel.xpath("//tr"):
    item = SteamItem()
    item ['title'] = tr.xpath('td[2]/a/text()').extract()
    item ['price'] = tr.xpath('td[@class="price-final"]/text()').extract()
    yield item

这看起来简单多了,就像一场梦。你是如何学习scrapy的?书籍/教程?
for tr in sel.xpath("//tr"):
    item = SteamItem()
    item ['title'] = tr.xpath('td[2]/a/text()').extract()
    item ['price'] = tr.xpath('td[@class="price-final"]/text()').extract()
    yield item