Python 刮擦:未处理的错误

Python 刮擦:未处理的错误,python,scrapy,Python,Scrapy,我的铲运机正常运转了大约一个小时。过了一会儿,我开始看到这些错误: 2014-01-16 21:26:06+0100 [-] Unhandled Error Traceback (most recent call last): File "/home/scraper/.fakeroot/lib/python2.7/site-packages/Scrapy-0.20.2-py2.7.egg/scrapy/crawler.py", line 93, in star

我的铲运机正常运转了大约一个小时。过了一会儿,我开始看到这些错误:

2014-01-16 21:26:06+0100 [-] Unhandled Error
        Traceback (most recent call last):
          File "/home/scraper/.fakeroot/lib/python2.7/site-packages/Scrapy-0.20.2-py2.7.egg/scrapy/crawler.py", line 93, in start
            self.start_reactor()
          File "/home/scraper/.fakeroot/lib/python2.7/site-packages/Scrapy-0.20.2-py2.7.egg/scrapy/crawler.py", line 130, in start_reactor
            reactor.run(installSignalHandlers=False)  # blocking call
          File "/home/scraper/.fakeroot/lib/python2.7/site-packages/twisted/internet/base.py", line 1192, in run
            self.mainLoop()
          File "/home/scraper/.fakeroot/lib/python2.7/site-packages/twisted/internet/base.py", line 1201, in mainLoop
            self.runUntilCurrent()
        --- <exception caught here> ---
          File "/home/scraper/.fakeroot/lib/python2.7/site-packages/twisted/internet/base.py", line 824, in runUntilCurrent
            call.func(*call.args, **call.kw)
          File "/home/scraper/.fakeroot/lib/python2.7/site-packages/Scrapy-0.20.2-py2.7.egg/scrapy/utils/reactor.py", line 41, in __call__
            return self._func(*self._a, **self._kw)
          File "/home/scraper/.fakeroot/lib/python2.7/site-packages/Scrapy-0.20.2-py2.7.egg/scrapy/core/engine.py", line 106, in _next_request
            if not self._next_request_from_scheduler(spider):
          File "/home/scraper/.fakeroot/lib/python2.7/site-packages/Scrapy-0.20.2-py2.7.egg/scrapy/core/engine.py", line 132, in _next_request_from_scheduler
            request = slot.scheduler.next_request()
          File "/home/scraper/.fakeroot/lib/python2.7/site-packages/Scrapy-0.20.2-py2.7.egg/scrapy/core/scheduler.py", line 64, in next_request
            request = self._dqpop()
          File "/home/scraper/.fakeroot/lib/python2.7/site-packages/Scrapy-0.20.2-py2.7.egg/scrapy/core/scheduler.py", line 94, in _dqpop
            d = self.dqs.pop()
          File "/home/scraper/.fakeroot/lib/python2.7/site-packages/queuelib/pqueue.py", line 43, in pop
            m = q.pop()
          File "/home/scraper/.fakeroot/lib/python2.7/site-packages/Scrapy-0.20.2-py2.7.egg/scrapy/squeue.py", line 18, in pop
            s = super(SerializableQueue, self).pop()
          File "/home/scraper/.fakeroot/lib/python2.7/site-packages/queuelib/queue.py", line 157, in pop
            self.f.seek(-size-self.SIZE_SIZE, os.SEEK_END)
        exceptions.IOError: [Errno 22] Invalid argument
2014-01-16 21:26:06+0100[-]未处理错误
回溯(最近一次呼叫最后一次):
文件“/home/scraper/.fakeroot/lib/python2.7/site packages/Scrapy-0.20.2-py2.7.egg/Scrapy/crawler.py”,第93行,开头
自启动反应堆()
文件“/home/scraper/.fakeroot/lib/python2.7/site packages/Scrapy-0.20.2-py2.7.egg/Scrapy/crawler.py”,第130行,在start_反应器中
reactor.run(installSignalHandlers=False)#阻止调用
文件“/home/scraper/.fakeroot/lib/python2.7/site packages/twisted/internet/base.py”,第1192行,运行中
self.mainLoop()
文件“/home/scraper/.fakeroot/lib/python2.7/site packages/twisted/internet/base.py”,第1201行,在mainLoop中
self.rununtlcurrent()
---  ---
文件“/home/scraper/.fakeroot/lib/python2.7/site packages/twisted/internet/base.py”,第824行,在rununtlcurrent中
call.func(*call.args,**call.kw)
文件“/home/scraper/.fakeroot/lib/python2.7/site packages/Scrapy-0.20.2-py2.7.egg/Scrapy/utils/reactor.py”,第41行,调用__
返回self.\u func(*self.\u a,**self.\u kw)
文件“/home/scraper/.fakeroot/lib/python2.7/site packages/Scrapy-0.20.2-py2.7.egg/Scrapy/core/engine.py”,第106行,在下一个请求中
如果不是self.\u下一个\u请求\u来自\u调度程序(spider):
文件“/home/scraper/.fakeroot/lib/python2.7/site packages/Scrapy-0.20.2-py2.7.egg/Scrapy/core/engine.py”,第132行,在来自调度程序的下一个请求中
request=slot.scheduler.next_request()
文件“/home/scraper/.fakeroot/lib/python2.7/site packages/Scrapy-0.20.2-py2.7.egg/Scrapy/core/scheduler.py”,第64行,在下一个请求中
请求=self.\u dqpop()
文件“/home/scraper/.fakeroot/lib/python2.7/site-packages/Scrapy-0.20.2-py2.7.egg/Scrapy/core/scheduler.py”,第94行,在
d=self.dqs.pop()
文件“/home/scraper/.fakeroot/lib/python2.7/site packages/queuelib/pqueue.py”,第43行,以pop格式
m=q.pop()
文件“/home/scraper/.fakeroot/lib/python2.7/site packages/Scrapy-0.20.2-py2.7.egg/Scrapy/squee.py”,第18行,弹出格式
s=super(SerializableQueue,self).pop()
文件“/home/scraper/.fakeroot/lib/python2.7/site packages/queuelib/queue.py”,第157行,在pop中
self.f.seek(-size-self.size\u size,操作系统seek\u END)
exceptions.IOError:[Errno 22]参数无效
这可能是什么原因造成的?我的版本是0.20.2。一旦我犯了这个错误,scrapy就会停止做任何事情。即使我停止并再次运行它(使用JOBDIR目录),它仍然会给我这些错误。如果需要消除这些错误,我需要删除作业目录并重新开始。

尝试以下操作:

  • 确保运行的是最新的Scrapy版本(当前版本:0.24)
  • 在恢复的文件夹中搜索,并备份文件请求。查看
  • 备份后,删除“碎片作业”文件夹
  • 再次使用JOBDIR=选项启动爬网恢复
  • 停止爬行
  • 将新创建的请求替换为以前备份的请求
  • 重新开始爬网

您是否使用同一个JOBDIR启动了多个爬网?@Rolando我可能有!你认为有什么方法可以修复这个坏状态吗?@AlexanderSuraphel你可以尝试使用
--pdb
来调试这个问题。然后,您可以按照Marcelo的建议清除
JOBDIR
状态。请看,您实际上不需要重新启动爬虫程序,您只需删除请求以外的文件即可。请看。