Python scrapy脚本无法列出错误
大家好 我有一个基于scrapy框架的简单解析器。以下是核心代码:Python scrapy脚本无法列出错误,python,scrapy,attributeerror,Python,Scrapy,Attributeerror,大家好 我有一个基于scrapy框架的简单解析器。以下是核心代码: #!/usr/bin/python #-*-coding:utf-8-*- import sys, os, logging from utils import append_project_to_python_path, load_spiders from scrapy.utils.log import configure_logging from scrapy.crawler import CrawlerProcess fro
#!/usr/bin/python
#-*-coding:utf-8-*-
import sys, os, logging
from utils import append_project_to_python_path, load_spiders
from scrapy.utils.log import configure_logging
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
PATH = os.path.dirname(os.path.realpath(sys.argv[0]))
append_project_to_python_path()
os.environ['DJANGO_SETTINGS_MODULE'] = 'delta_parser.settings' #add django settings to the project
os.environ.setdefault('SCRAPY_SETTINGS_MODULE', 'scrapy_parser.settings') #add path to scrapy settings to the project
# Settings for logging
configure_logging(install_root_handler=False)
logging.basicConfig(
filename = PATH + '/output/delta_scraper.log',
filemode = 'w+b',
format = '%(asctime)s [%(name)s] %(levelname)s: %(message)s',
)
# Scrapy settings and spiders
settings = get_project_settings()
process = CrawlerProcess(settings)
spiders = load_spiders()
map(process.crawl, spiders) # attachs spiders to the crawling process
# Commented block for testing chosen spiders from this script
#from scrapy_parser.spiders.company_revsite_spider import CompanyRevsiteSpider
#process.crawl(CompanyRevsiteSpider)
process.start() # the script will block here until all crawling jobs are finished
它从spiders文件夹中的spiders中加载约30个spider类,将它们附加到爬网进程,并在scrapy设置中使用中间件将项目写入数据库。这个简单的方案已经很好地工作了一段时间,但是现在我遇到了一些错误,比如CannotListError:无法侦听127.0.0.1:6073:[Errno 98]已经在使用的地址和AttributeError:TelnetConsole实例没有属性“port”
我最近没有添加爬行器或更改设置。在运行脚本之前,没有使用127.0.0.1:6073的程序。任何帮助都将不胜感激
编辑:
2016-04-11 06:20:04539[scrapy.telnet]调试:telnet控制台监听127.0.0.1:6072
2016-04-11 06:20:04556[scrapy.middleware]信息:启用的扩展:CloseSpider、TelnetConsole、LogStats、CoreStats、SpiderState、AutoThrottle
2016-04-11 06:20:04559[scrapy.middleware]信息:启用的下载程序中间件:HttpAuthMiddleware、DownloadTimeoutMiddleware、UserAgentMiddleware、RetryMiddleware、DefaultHeadersMiddleware、MetaRefreshMiddleware、HttpCompressionMiddleware、RedirectMiddleware、ChunkedTransferMiddleware、DownloadersStats
2016-04-11 06:20:04560[剪贴簿中间件]信息:启用的蜘蛛中间件:HttpErrorMiddleware、OffsiteMiddleware、RefererMiddleware、UrlLengthMiddleware、DepthMiddleware
2016-04-11 06:20:04561[scrapy.middleware]信息:启用的项目管道:ProcessItemFields、CsvExportPipeline、DBExportPipeline
2016-04-11 06:20:04562[刮屑核心引擎]信息:蜘蛛网已打开
2016-04-11 06:20:04563[scrapy.extensions.logstats]信息:爬网0页(每分钟0页),爬网0项(每分钟0项)
2016-04-11 06:20:04574[scrapy.telnet]调试:telnet控制台监听127.0.0.1:6073
2016-04-11 06:20:04623[scrapy.utils.signal]错误:发现错误 信号处理程序:>
回溯(最近一次呼叫最后一次):
文件“/home/vagrant/.virtualenvs/big brother/local/lib/python2.7/site packages/twisted/internet/defer.py”,第150行,格式为maybeDeferred
结果=f(*参数,**kw)
robustapply中的文件“/home/vagrant/.virtualenvs/big brother/local/lib/python2.7/site packages/scrapy/xlib/pydispatch/robustapply.py”,第57行
返回接收者(*参数,**命名)
文件“/home/vagrant/.virtualenvs/big brother/local/lib/python2.7/site packages/scrapy/telnet.py”,第56行,开始收听
self.port=侦听\u tcp(self.portrange、self.host、self)
文件“/home/vagrant/.virtualenvs/big brother/local/lib/python2.7/site packages/scrapy/utils/reactor.py”,第14行,在listen\u tcp
返回反应器.listenTCP(x,工厂,接口=主机)
文件“/home/vagrant/.virtualenvs/big brother/local/lib/python2.7/site packages/twisted/internet/posixbase.py”,第478行,在listenTCP中
p、 startListening()
文件“/home/vagrant/.virtualenvs/big brother/local/lib/python2.7/site packages/twisted/internet/tcp.py”,第984行,在startListening中
raise CannotListError(self.interface,self.port,le)CannotListError:无法侦听127.0.0.1:6073:[Errno 98]地址已在使用中。 2016-04-11 06:20:04642[scrapy.中间件]信息:启用的扩展:CloseSpider、TelnetConsole、LogStats、CoreStats、SpiderState、AutoChrottle
2016-04-11 06:20:04645[scrapy.middleware]信息:启用的下载中间件:HttpAuthMiddleware、DownloadTimeoutMiddleware、UserAgentMiddleware、RetryMiddleware、DefaultHeadersMiddleware、MetaRefreshMiddleware、HttpCompressionMiddleware、RedirectMiddleware、ChunkedTransferMiddleware、DownloadersStats
2016-04-11 06:20:04647[剪贴簿中间件]信息:启用的蜘蛛中间件:HttpErrorMiddleware、OffsiteMiddleware、RefererMiddleware、UrlLengthMiddleware、DepthMiddleware
2016-04-11 06:20:04648[scrapy.middleware]信息:启用的项目管道:ProcessItemFields、CsvExportPipeline、DBExportPipeline
2016-04-11 06:20:04649[刮屑核心引擎]信息:蜘蛛网已打开
2016-04-11 06:20:04650[scrapy.extensions.logstats]信息:爬网0页(0页/分钟),爬网0项(0项/分钟)
2016-04-11 06:20:04665[scrapy.utils.signal]错误:在信号处理器上捕获错误:>
回溯(最近一次呼叫最后一次):
文件“/home/vagrant/.virtualenvs/big brother/local/lib/python2.7/site packages/twisted/internet/defer.py”,第150行,格式为maybeDeferred
结果=f(*参数,**kw)
robustapply中的文件“/home/vagrant/.virtualenvs/big brother/local/lib/python2.7/site packages/scrapy/xlib/pydispatch/robustapply.py”,第57行
返回接收者(*参数,**命名)
文件“/home/vagrant/.virtualenvs/big brother/local/lib/python2.7/site packages/scrapy/telnet.py”,第56行,开始收听
self.port=侦听\u tcp(self.portrange、self.host、self)
文件“/home/vagrant/.virtualenvs/big brother/local/lib/python2.7/site packages/scrapy/utils/reactor.py”,第14行,在listen\u tcp
返回反应器.listenTCP(x,工厂,接口=主机)
文件“/home/vagrant/.virtualenvs/big brother/local/lib/python2.7/site packages/twisted/internet/posixbase.py”,第478行,在listenTCP中
p、 startListening()
文件“/home/vagrant/.virtualenvs/big brother/local/lib/python2.7/site packages/twisted/internet/tcp.py”,第984行,在startListening中
引发CannotListError(self.interface,self.port,le)CannotListError:无法侦听127.0.0.1:6073:[Errno 98]地址已在使用中。
... (更多类似上面的错误)
... (爬行器开始抓取返回项的页面,即解析