Python 刮擦:未处理的错误
我的铲运机正常运转了大约一个小时。过了一会儿,我开始看到这些错误:Python 刮擦:未处理的错误,python,scrapy,Python,Scrapy,我的铲运机正常运转了大约一个小时。过了一会儿,我开始看到这些错误: 2014-01-16 21:26:06+0100 [-] Unhandled Error Traceback (most recent call last): File "/home/scraper/.fakeroot/lib/python2.7/site-packages/Scrapy-0.20.2-py2.7.egg/scrapy/crawler.py", line 93, in star
2014-01-16 21:26:06+0100 [-] Unhandled Error
Traceback (most recent call last):
File "/home/scraper/.fakeroot/lib/python2.7/site-packages/Scrapy-0.20.2-py2.7.egg/scrapy/crawler.py", line 93, in start
self.start_reactor()
File "/home/scraper/.fakeroot/lib/python2.7/site-packages/Scrapy-0.20.2-py2.7.egg/scrapy/crawler.py", line 130, in start_reactor
reactor.run(installSignalHandlers=False) # blocking call
File "/home/scraper/.fakeroot/lib/python2.7/site-packages/twisted/internet/base.py", line 1192, in run
self.mainLoop()
File "/home/scraper/.fakeroot/lib/python2.7/site-packages/twisted/internet/base.py", line 1201, in mainLoop
self.runUntilCurrent()
--- <exception caught here> ---
File "/home/scraper/.fakeroot/lib/python2.7/site-packages/twisted/internet/base.py", line 824, in runUntilCurrent
call.func(*call.args, **call.kw)
File "/home/scraper/.fakeroot/lib/python2.7/site-packages/Scrapy-0.20.2-py2.7.egg/scrapy/utils/reactor.py", line 41, in __call__
return self._func(*self._a, **self._kw)
File "/home/scraper/.fakeroot/lib/python2.7/site-packages/Scrapy-0.20.2-py2.7.egg/scrapy/core/engine.py", line 106, in _next_request
if not self._next_request_from_scheduler(spider):
File "/home/scraper/.fakeroot/lib/python2.7/site-packages/Scrapy-0.20.2-py2.7.egg/scrapy/core/engine.py", line 132, in _next_request_from_scheduler
request = slot.scheduler.next_request()
File "/home/scraper/.fakeroot/lib/python2.7/site-packages/Scrapy-0.20.2-py2.7.egg/scrapy/core/scheduler.py", line 64, in next_request
request = self._dqpop()
File "/home/scraper/.fakeroot/lib/python2.7/site-packages/Scrapy-0.20.2-py2.7.egg/scrapy/core/scheduler.py", line 94, in _dqpop
d = self.dqs.pop()
File "/home/scraper/.fakeroot/lib/python2.7/site-packages/queuelib/pqueue.py", line 43, in pop
m = q.pop()
File "/home/scraper/.fakeroot/lib/python2.7/site-packages/Scrapy-0.20.2-py2.7.egg/scrapy/squeue.py", line 18, in pop
s = super(SerializableQueue, self).pop()
File "/home/scraper/.fakeroot/lib/python2.7/site-packages/queuelib/queue.py", line 157, in pop
self.f.seek(-size-self.SIZE_SIZE, os.SEEK_END)
exceptions.IOError: [Errno 22] Invalid argument
2014-01-16 21:26:06+0100[-]未处理错误
回溯(最近一次呼叫最后一次):
文件“/home/scraper/.fakeroot/lib/python2.7/site packages/Scrapy-0.20.2-py2.7.egg/Scrapy/crawler.py”,第93行,开头
自启动反应堆()
文件“/home/scraper/.fakeroot/lib/python2.7/site packages/Scrapy-0.20.2-py2.7.egg/Scrapy/crawler.py”,第130行,在start_反应器中
reactor.run(installSignalHandlers=False)#阻止调用
文件“/home/scraper/.fakeroot/lib/python2.7/site packages/twisted/internet/base.py”,第1192行,运行中
self.mainLoop()
文件“/home/scraper/.fakeroot/lib/python2.7/site packages/twisted/internet/base.py”,第1201行,在mainLoop中
self.rununtlcurrent()
--- ---
文件“/home/scraper/.fakeroot/lib/python2.7/site packages/twisted/internet/base.py”,第824行,在rununtlcurrent中
call.func(*call.args,**call.kw)
文件“/home/scraper/.fakeroot/lib/python2.7/site packages/Scrapy-0.20.2-py2.7.egg/Scrapy/utils/reactor.py”,第41行,调用__
返回self.\u func(*self.\u a,**self.\u kw)
文件“/home/scraper/.fakeroot/lib/python2.7/site packages/Scrapy-0.20.2-py2.7.egg/Scrapy/core/engine.py”,第106行,在下一个请求中
如果不是self.\u下一个\u请求\u来自\u调度程序(spider):
文件“/home/scraper/.fakeroot/lib/python2.7/site packages/Scrapy-0.20.2-py2.7.egg/Scrapy/core/engine.py”,第132行,在来自调度程序的下一个请求中
request=slot.scheduler.next_request()
文件“/home/scraper/.fakeroot/lib/python2.7/site packages/Scrapy-0.20.2-py2.7.egg/Scrapy/core/scheduler.py”,第64行,在下一个请求中
请求=self.\u dqpop()
文件“/home/scraper/.fakeroot/lib/python2.7/site-packages/Scrapy-0.20.2-py2.7.egg/Scrapy/core/scheduler.py”,第94行,在
d=self.dqs.pop()
文件“/home/scraper/.fakeroot/lib/python2.7/site packages/queuelib/pqueue.py”,第43行,以pop格式
m=q.pop()
文件“/home/scraper/.fakeroot/lib/python2.7/site packages/Scrapy-0.20.2-py2.7.egg/Scrapy/squee.py”,第18行,弹出格式
s=super(SerializableQueue,self).pop()
文件“/home/scraper/.fakeroot/lib/python2.7/site packages/queuelib/queue.py”,第157行,在pop中
self.f.seek(-size-self.size\u size,操作系统seek\u END)
exceptions.IOError:[Errno 22]参数无效
这可能是什么原因造成的?我的版本是0.20.2。一旦我犯了这个错误,scrapy就会停止做任何事情。即使我停止并再次运行它(使用JOBDIR目录),它仍然会给我这些错误。如果需要消除这些错误,我需要删除作业目录并重新开始。尝试以下操作:
- 确保运行的是最新的Scrapy版本(当前版本:0.24)
- 在恢复的文件夹中搜索,并备份文件请求。查看
- 备份后,删除“碎片作业”文件夹
- 再次使用JOBDIR=选项启动爬网恢复
- 停止爬行
- 将新创建的请求替换为以前备份的请求
- 重新开始爬网
--pdb
来调试这个问题。然后,您可以按照Marcelo的建议清除JOBDIR
状态。请看,您实际上不需要重新启动爬虫程序,您只需删除请求以外的文件即可。请看。