Scrapy 为什么刮痧蜘蛛不装货?

Scrapy 为什么刮痧蜘蛛不装货?,scrapy,Scrapy,我对刮片领域有点陌生,能够为我的spider管理以下代码: import os os.environ.setdefault('SCRAPY_SETTINGS_MODULE', 'thesentientspider.settings') from scrapy.selector import HtmlXPathSelector from scrapy.spider import BaseSpider from scrapy.http import Request from scrapy.utils

我对刮片领域有点陌生,能够为我的spider管理以下代码:

import os
os.environ.setdefault('SCRAPY_SETTINGS_MODULE', 'thesentientspider.settings')
from scrapy.selector import HtmlXPathSelector
from scrapy.spider import BaseSpider
from scrapy.http import Request
from scrapy.utils.response import get_base_url
from urlparse import urljoin
from thesentientspider.items import RestaurantDetails, UserReview
import urllib
from scrapy.conf import settings
import pymongo
from pymongo import MongoClient

#MONGODB Settings
MongoDBServer=settings['MONGODB_SERVER']
MongoDBPort=settings['MONGODB_PORT']

class ZomatoSpider(BaseSpider):
    name = 'zomatoSpider'
    allowed_domains = ['zomato.com']
    CITY=["hyderabad"]
    start_urls = [
        'http://www.zomato.com/%s/restaurants/'  %cityName for cityName in CITY
        ]

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        BASE_URL=get_base_url(response)
但是,当我尝试通过scrapy crawl zomatoSpider命令启动它时,它会抛出以下错误:

Traceback (most recent call last):
  File "/usr/bin/scrapy", line 4, in <module>
    execute()
  File "/usr/lib/pymodules/python2.6/scrapy/cmdline.py", line 131, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/usr/lib/pymodules/python2.6/scrapy/cmdline.py", line 76, in _run_print_help
    func(*a, **kw)
  File "/usr/lib/pymodules/python2.6/scrapy/cmdline.py", line 138, in _run_command
    cmd.run(args, opts)
  File "/usr/lib/pymodules/python2.6/scrapy/commands/crawl.py", line 43, in run
    spider = self.crawler.spiders.create(spname, **opts.spargs)
  File "/usr/lib/pymodules/python2.6/scrapy/command.py", line 33, in crawler
    self._crawler.configure()
  File "/usr/lib/pymodules/python2.6/scrapy/crawler.py", line 40, in configure
    self.spiders = spman_cls.from_crawler(self)
  File "/usr/lib/pymodules/python2.6/scrapy/spidermanager.py", line 35, in from_crawler
    sm = cls.from_settings(crawler.settings)
  File "/usr/lib/pymodules/python2.6/scrapy/spidermanager.py", line 31, in from_settings
    return cls(settings.getlist('SPIDER_MODULES'))
  File "/usr/lib/pymodules/python2.6/scrapy/spidermanager.py", line 23, in __init__
    self._load_spiders(module)
  File "/usr/lib/pymodules/python2.6/scrapy/spidermanager.py", line 26, in _load_spiders
    for spcls in iter_spider_classes(module):
  File "/usr/lib/pymodules/python2.6/scrapy/utils/spider.py", line 21, in iter_spider_classes
    issubclass(obj, BaseSpider) and \
TypeError: issubclass() arg 1 must be a class
回溯(最近一次呼叫最后一次):
文件“/usr/bin/scrapy”,第4行,在
执行()
文件“/usr/lib/pymodules/python2.6/scrapy/cmdline.py”,执行中的第131行
_运行\u打印\u帮助(解析器、\u运行\u命令、cmd、args、opts)
文件“/usr/lib/pymodules/python2.6/scrapy/cmdline.py”,第76行,在“运行”和“打印”帮助中
func(*a,**千瓦)
文件“/usr/lib/pymodules/python2.6/scrapy/cmdline.py”,第138行,在_run_命令中
cmd.run(参数、选项)
文件“/usr/lib/pymodules/python2.6/scrapy/commands/crawl.py”,第43行,运行中
spider=self.crawler.spider.create(spname,**opts.spargs)
文件“/usr/lib/pymodules/python2.6/scrapy/command.py”,第33行,在crawler中
self.\u crawler.configure()
文件“/usr/lib/pymodules/python2.6/scrapy/crawler.py”,第40行,在configure中
self.spider=spman\u cls.来自爬虫(self)
文件“/usr/lib/pymodules/python2.6/scrapy/spidermanager.py”,第35行,来自爬虫程序
sm=cls.from_设置(爬虫程序设置)
文件“/usr/lib/pymodules/python2.6/scrapy/spidermanager.py”,第31行,在from_设置中
返回cls(settings.getlist('SPIDER_MODULES'))
文件“/usr/lib/pymodules/python2.6/scrapy/spidermanager.py”,第23行,在__
自加载卡盘(模块)
文件“/usr/lib/pymodules/python2.6/scrapy/spidermanager.py”,第26行,在加载爬行器中
对于iter\U蜘蛛类(模块)中的SPCL:
文件“/usr/lib/pymodules/python2.6/scrapy/utils/spider.py”,第21行,在iter_spider_类中
issubclass(obj、BaseSpider)和\
TypeError:issubclass()arg 1必须是类
有人能指出根本原因并通过代码片段建议修改吗

def __init__(self):

        MongoDBServer=settings['MONGODB_SERVER']
        MongoDBPort=settings['MONGODB_PORT']
        database=settings['MONGODB_DB']
        rest_coll=settings['RESTAURANTS_COLLECTION']
        review_coll=settings['REVIEWS_COLLECTION']

        client=MongoClient(MongoDBServer, MongoDBPort)
        db=client[database]
        self.restaurantsCollection=db[rest_coll]
        self.reviewsCollection=db[review_coll]

这是我为使其工作而添加的代码。希望能有所帮助。

这是您的spider的全部代码吗?您在
start\u URL
definition的行中有语法错误。目录里还有其他蜘蛛吗?@alecxe你是说多余的逗号?编辑出来。它不在源代码中。一定是在我发布代码片段时收到的。这是代码的初始部分…其余部分主要是解析定义。问题是,我能够更早地启动爬虫程序,但当我添加用于处理MongoDB连接(MongoClient、settings等)的代码时,它崩溃了,我无法理解原因。好的,您提供的代码对我来说看起来不错。目录中还有其他爬行器吗?@alecxe没有,这是唯一的一个。你认为这与我使用的不推荐的访问设置的方式有关吗?我不这么认为,你能展示整个蜘蛛吗,这样我就可以自己运行它了?