Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/298.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/ant/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Scrapy教程KeyError:';未找到Spider:_Python_Scrapy - Fatal编程技术网

Python Scrapy教程KeyError:';未找到Spider:

Python Scrapy教程KeyError:';未找到Spider:,python,scrapy,Python,Scrapy,我正在尝试编写我的第一个scrapy spider,我一直在遵循上的教程,但我得到了一个错误“KeyError:'spider not found:” 我想我是从正确的目录(带有scrapy.cfg文件的目录)运行该命令的 这是我得到的错误 (proscraper)#( 10/14/14@ 2:13pm )( tim@localhost ):~/Workspace/Development/hacks/prosum-scraper/scrapy scrapy crawl juno /home

我正在尝试编写我的第一个scrapy spider,我一直在遵循上的教程,但我得到了一个错误“KeyError:'spider not found:”

我想我是从正确的目录(带有scrapy.cfg文件的目录)运行该命令的

这是我得到的错误

(proscraper)#( 10/14/14@ 2:13pm )( tim@localhost ):~/Workspace/Development/hacks/prosum-scraper/scrapy
   scrapy crawl juno
/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/twisted/internet/_sslverify.py:184: UserWarning: You do not have the service_identity module installed. Please install it from <https://pypi.python.org/pypi/service_identity>. Without the service_identity module and a recent enough pyOpenSSL tosupport it, Twisted can perform only rudimentary TLS client hostnameverification.  Many valid certificate/hostname mappings may be rejected.
  verifyHostname, VerificationError = _selectVerifyImplementation()
Traceback (most recent call last):
  File "/home/tim/.virtualenvs/proscraper/bin/scrapy", line 9, in <module>
    load_entry_point('Scrapy==0.24.4', 'console_scripts', 'scrapy')()
  File "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/cmdline.py", line 143, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/cmdline.py", line 89, in _run_print_help
    func(*a, **kw)
  File "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/cmdline.py", line 150, in _run_command
    cmd.run(args, opts)
  File "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/commands/crawl.py", line 58, in run
    spider = crawler.spiders.create(spname, **opts.spargs)
  File "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/spidermanager.py", line 44, in create
    raise KeyError("Spider not found: %s" % spider_name)
KeyError: 'Spider not found: juno'
以下是填写name属性的my spider代码:

(proscraper)#( 10/14/14@ 2:14pm )( tim@localhost ):~/Workspace/Development/hacks/prosum-scraper/scrapy
   cat scrapy/spiders/juno_spider.py 
import scrapy

class JunoSpider(scrapy.Spider):
    name = "juno"
    allowed_domains = ["http://www.juno.co.uk/"]
    start_urls = [
        "http://www.juno.co.uk/dj-equipment/"
    ]

    def parse(self, response):
        filename = response.url.split("/")[-2]
        with open(filename, 'wb') as f:
            f.write(response.body)

当您以scrapy作为项目名称启动项目时,它将创建您打印的目录结构:

.
├── scrapy
│   ├── __init__.py
│   ├── items.py
│   ├── pipelines.py
│   ├── settings.py
│   └── spiders
│       ├── __init__.py
│       └── juno_spider.py
└── scrapy.cfg
但是使用scrapy作为项目名称有附带的效果。如果打开生成的
scrapy.cfg
,您将看到默认设置指向
scrapy.settings
模块

[settings]
default = scrapy.settings
当我们对
scrapy.settings
文件进行分类时,我们会看到:

BOT_NAME = 'scrapy'

SPIDER_MODULES = ['scrapy.spiders']
NEWSPIDER_MODULE = 'scrapy.spiders'
嗯,这里没什么奇怪的。bot名称、Scrapy将在其中查找spider的模块列表,以及使用genspider命令创建新spider的模块。到目前为止,一切顺利


现在让我们检查一下scrapy库。它已正确安装在proscraper隔离的virtualenv下的
/home/tim/.virtualenvs/proscraper/lib/python2.7/site packages/scrapy
目录下。请记住,
site packages
始终添加到
sys.path
,其中包含Python将从中搜索模块的所有路径。你猜怎么着。。。scrapy库还有一个
settings
模块
/home/tim/.virtualenvs/proscraper/lib/python2.7/site packages/scrapy/settings
,它导入
/home/tim/.virtualenvs/proscraper/lib/python2.7/site packages/scrapy/settings/default\u settings.py>,保存所有设置的默认值。特别注意默认的
SPIDER\u模块
条目:

SPIDER_MODULES = []
也许你开始了解正在发生的事情。选择scrapy作为项目名称也会生成一个与scrapy库冲突的
scrapy.settings
模块。在这里,对应路径插入到sys.path中的顺序将使Python导入其中一个路径。首先出现的人获胜。在这种情况下,scrapy库设置获胜。因此出现了
键错误:“Spider not found:juno”

要解决此冲突,可以将项目文件夹重命名为另一个名称,例如
scrap

.
├── scrap
│   ├── __init__.py
修改
scrapy.cfg
以指向正确的
设置
模块:

[settings]
default = scrap.settings
并更新您的
scrap.settings
以指向正确的爬行器:

SPIDER_MODULES = ['scrap.spiders']

但正如@paultrmbrth建议的那样,我将用另一个名称重新创建该项目。

/home/tim/.virtualenvs/proscraper/lib/python2.7/site packages/twisted/internet/\u sslverify.py:184:UserWarning:您没有安装服务标识模块。请从安装。如果没有service_标识模块和最近足够支持它的pyOpenSSL,Twisted只能执行基本的TLS客户端主机名验证。许多有效的证书/主机名映射可能被拒绝。您需要安装该模块我认为这不会影响命令的查找OK我安装了它
pip install service_identity
我仍然会收到相同的错误尝试像这样重新创建您的项目:
scrapy start项目junoproject
scrapy genspider juno.co.uk
在发布时编辑您的spider,然后再跑。我可以很好地运行这个文件夹,我不会将带有
items.py
settings.py
等的文件夹称为“scrapy”。改名
[settings]
default = scrap.settings
SPIDER_MODULES = ['scrap.spiders']