Python Scrapy spider扩展无法记录数据库管道错误

Python Scrapy spider扩展无法记录数据库管道错误,python,web-scraping,scrapy,Python,Web Scraping,Scrapy,我正在尝试向我的scrapy分机添加一个信号,以便在通过spider\u error信号出现错误时向我发送电子邮件。即使管道中存在错误,看起来这些错误不是由十字轴信号记录的,或者十字轴在项目被刮到管道中后不负责?有没有办法从扩展中记录这些内容?下面是我的扩展的代码,它收集数据库中每个蜘蛛的统计信息,接下来我试图通过电子邮件发送错误,错误信号似乎不会触发: class StatsCollectorExtension(object): def __init__(self, stats):

我正在尝试向我的scrapy分机添加一个信号,以便在通过
spider\u error
信号出现错误时向我发送电子邮件。即使管道中存在错误,看起来这些错误不是由十字轴信号记录的,或者十字轴在项目被刮到管道中后不负责?有没有办法从扩展中记录这些内容?下面是我的扩展的代码,它收集数据库中每个蜘蛛的统计信息,接下来我试图通过电子邮件发送错误,错误信号似乎不会触发:

class StatsCollectorExtension(object):
    def __init__(self, stats):
        self.stats = stats
        self.num_errors = 0
        self.errors = []

    @classmethod
    def from_crawler(cls, crawler):
        ext = cls(crawler.stats)
        crawler.signals.connect(ext.spider_error, signal=signals.spider_error)
        crawler.signals.connect(ext.spider_closed, signal=signals.spider_closed)
        return ext

    def spider_closed(self, spider):
        """
        When the spider closes then
        store the stats(start time, end time, items scraped,
        pages crawled) into the database for each scraper.
        Also send the errors through email if any.
        """
        start_time = self.stats._stats['start_time']
        finish_time = self.stats._stats['finish_time']
        items_scraped_count = self.stats._stats['item_scraped_count']
        spider_name = spider.name
        pages_crawled_count = self.stats._stats['downloader/request_method_count/GET']

        # add the scrapy stats to DB via SQL Alchemy object
        stats = ScrapyStats(scrapername=spider_name,
                            start_time=start_time,
                            finish_time=finish_time,
                            items_scraped=items_scraped_count,
                            pages_crawled=pages_crawled_count)
        db_session.add(stats)
        db_session.commit()

        if self.num_errors:
            # Mandrill mail client that sends me an email
            html = ''.join(self.errors)
            subject = '%s errors found' % self.num_errors
            send_mail(subject, from_email, from_name,
                      html, to_email, to_mail, mandrill_key)

    def spider_error(self, failure, response, spider):
        self.errors.append(failure.getTraceback())
        self.num_errors += 1
这里还有stacktrace

2015-01-08 13:13:20-0500 [ferc-staff-reports] ERROR: Error processing {'additional_documents': None,
     'ekwhere': 'Fed',
     'id': 'FERCaeff76181cc2bc14651c693d30300b99a7673219',
     'publishdate': datetime.datetime(2013, 1, 30, 0, 0),
     'title': 'The IV Formulation and Linear Approximations of the AC Optimal Power Flow Problem: Optimal Power Flow Paper 2',
     'type': 'FERC Staff Reports & Papers - Staff Papers',
     'url': u'http://www.ferc.gov/industries/electric/indus-act/market-planning/opf-papers/acopf-2-iv-linearization.pdf'}
    Traceback (most recent call last):
      File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 62, in _process_chain
        return process_chain(self.methods[methodname], obj, *args)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 65, in process_chain
        d.callback(input)
      File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 382, in callback
        self._startRunCallbacks(result)
      File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 490, in _startRunCallbacks
        self._runCallbacks()
    --- <exception caught here> ---
      File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 577, in _runCallbacks
        current.result = callback(current.result, *args, **kw)
      File "/home/kiran/workspace/EK-source-scrapers/helpers/pipelines.py", line 88, in process_item
        insert_item(item, spider.settings["table"])
      File "/home/kiran/workspace/EK-source-scrapers/helpers/db_helper.py", line 54, in insert_item
        db_session.commit()
      File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/scoping.py", line 149, in do
        return getattr(self.registry(), name)(*args, **kwargs)
      File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 765, in commit
        self.transaction.commit()
      File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 370, in commit
        self._prepare_impl()
      File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 350, in _prepare_impl
        self.session.flush()
      File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 1879, in flush
        self._flush(objects)
      File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 1997, in _flush
        transaction.rollback(_capture_exception=True)
      File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/util/langhelpers.py", line 57, in __exit__
        compat.reraise(exc_type, exc_value, exc_tb)
      File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 1961, in _flush
        flush_context.execute()
      File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/unitofwork.py", line 370, in execute
        rec.execute(self)
      File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/unitofwork.py", line 523, in execute
        uow
      File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line 64, in save_obj
        mapper, table, insert)
      File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line 562, in _emit_insert_statements
        execute(statement, multiparams)
      File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 717, in execute
        return meth(self, multiparams, params)
      File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/elements.py", line 317, in _execute_on_connection
        return connection._execute_clauseelement(self, multiparams, params)
      File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 814, in _execute_clauseelement
        compiled_sql, distilled_params
      File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 927, in _execute_context
        context)
      File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1076, in _handle_dbapi_exception
        exc_info
      File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/util/compat.py", line 185, in raise_from_cause
        reraise(type(exception), exception, tb=exc_tb)
      File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 920, in _execute_context
        context)
      File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 425, in do_execute
        cursor.execute(statement, parameters)
      File "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 174, in execute
        self.errorhandler(self, exc, value)
      File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
        raise errorclass, errorvalue
    sqlalchemy.exc.OperationalError: (OperationalError) (1054, "Unknown column 'additional_documents' in 'field list'") 'INSERT INTO sourceferc (id, title, url, type, publishdate, scrapedate, ekwhere, summary, docket_no, additional_documents) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s)' ('FERCaeff76181cc2bc14651c693d30300b99a7673219', 'The IV Formulation and Linear Approximations of the AC Optimal Power Flow Problem: Optimal Power Flow Paper 2', u'http://www.ferc.gov/industries/electric/indus-act/market-planning/opf-papers/acopf-2-iv-linearization.pdf', 'FERC Staff Reports & Papers - Staff Papers', datetime.datetime(2013, 1, 30, 0, 0), datetime.date(2015, 1, 8), 'Fed', None, None, None)
2015-01-08 13:13:20-0500[ferc员工报告]错误:错误处理{‘附加文档’:无,
“ekwhere”:“Fed”,
“id”:“Fercaeff76181CC2BC14651C93D30300B99A7673219”,
“publishdate”:datetime.datetime(2013,1,30,0,0),
‘标题’:‘交流最优潮流问题的IV公式和线性近似:最优潮流论文2’,
“类型”:“FERC员工报告和论文-员工论文”,
'url':u'http://www.ferc.gov/industries/electric/indus-act/market-planning/opf-papers/acopf-2-iv-linearization.pdf'}
回溯(最近一次呼叫最后一次):
文件“/usr/local/lib/python2.7/dist packages/scrapy/middleware.py”,第62行,进程链中
返回过程链(self.methods[methodname],obj,*args)
文件“/usr/local/lib/python2.7/dist packages/scrapy/utils/defer.py”,第65行,进程链中
d、 回调(输入)
回调中的文件“/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py”,第382行
自启动返回(结果)
文件“/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py”,第490行,在startRunCallbacks中
self.\u runCallbacks()
---  ---
文件“/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py”,第577行,在运行回调中
current.result=回调(current.result,*args,**kw)
文件“/home/kiran/workspace/EK source scrapers/helpers/pipelines.py”,第88行,过程中项目
插入_项(项,spider.settings[“表”])
文件“/home/kiran/workspace/EK source scrapers/helpers/db_helper.py”,第54行,插入_项
db_session.commit()
文件“/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/scoping.py”,do中的第149行
返回getattr(self.registry(),name)(*args,**kwargs)
提交中的文件“/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py”,第765行
self.transaction.commit()
文件“/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py”,第370行,在提交中
self.\u prepare\u impl()
文件“/usr/local/lib/python2.7/dist packages/sqlalchemy/orm/session.py”,第350行,在“准备”impl中
self.session.flush()
文件“/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py”,第1879行,刷新
自冲洗(对象)
文件“/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py”,第1997行,在
事务.rollback(\u capture\u exception=True)
文件“/usr/local/lib/python2.7/dist packages/sqlalchemy/util/langhelpers.py”,第57行,在__
兼容性(exc_类型、exc_值、exc_tb)
文件“/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py”,第1961行,在
flush_context.execute()
文件“/usr/local/lib/python2.7/dist packages/sqlalchemy/orm/unitofwork.py”,第370行,在execute中
rec.execute(self)
文件“/usr/local/lib/python2.7/dist packages/sqlalchemy/orm/unitofwork.py”,执行中的第523行
uow
文件“/usr/local/lib/python2.7/dist packages/sqlalchemy/orm/persistence.py”,第64行,在save_obj中
映射器、表、插入)
文件“/usr/local/lib/python2.7/dist packages/sqlalchemy/orm/persistence.py”,第562行,在发出插入语句中
执行(语句,多内存)
文件“/usr/local/lib/python2.7/dist packages/sqlalchemy/engine/base.py”,第717行,在execute中
返回方法(自身、多线程、参数)
文件“/usr/local/lib/python2.7/dist packages/sqlalchemy/sql/elements.py”,第317行,在连接上执行
返回连接。_execute_clauseelement(self、multiparams、params)
文件“/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py”,第814行,在“执行”子句元素中
编译的sql,提取的参数
文件“/usr/local/lib/python2.7/dist packages/sqlalchemy/engine/base.py”,第927行,在执行上下文中
(上下文)
文件“/usr/local/lib/python2.7/dist packages/sqlalchemy/engine/base.py”,第1076行,在_handle_dbapi_exception中
exc_信息
文件“/usr/local/lib/python2.7/dist-packages/sqlalchemy/util/compat.py”,第185行,在raise\u from\u cause中
重新释放(类型(异常),异常,tb=exc_tb)
文件“/usr/local/lib/python2.7/dist packages/sqlalchemy/engine/base.py”,第920行,在执行上下文中
(上下文)
文件“/usr/local/lib/python2.7/dist packages/sqlalchemy/engine/default.py”,第425行,在do_execute中
cursor.execute(语句、参数)
文件“/usr/lib/python2.7/dist packages/MySQLdb/cursors.py”,执行中的第174行
errorhandler(self、exc、value)
文件“/usr/lib/python2.7/dist packages/MySQLdb/connections.py”,第36行,在defaulterrorhandler中
提高errorclass,errorvalue
sqlalchemy.exc.OperationalError:(OperationalError)(1054,“字段列表”中的未知列“附加文档”)插入sourceferc(id、标题、url、类型、发布日期、ScrapDate、ekwhere、摘要、卷宗号、附加文档)值(%s、%s、%s、%s、%s、%s、%s)]('Fercaeff76181CC2BC14651C93D30300B99A7673219','交流最优潮流问题的IV公式和线性近似:最优潮流论文2',u'http://www.ferc.gov/industries/electric/indus-act/market-planning/opf-papers/acopf-2-iv-linearization.pdf“,”联邦能源监管委员会工作人员报告&