Scrapy 刮痧初学者得到例外

Scrapy 刮痧初学者得到例外,scrapy,scrapy-spider,Scrapy,Scrapy Spider,我需要帮助。我想为一个特定的网站做一个爬虫程序(DewriteJournal)。我想从站点获取这些数据,为我创建一个控制台输出,因为我主要在控制台上工作,不想经常切换。另一点是我想将数据推送到数据库中(sql等没有问题)。但不知何故,我只是在尝试执行爬虫程序时显示了这一点,我认为教程并没有真正的帮助: 2016-10-05 10:55:23 [scrapy] INFO: Scrapy 1.0.3 started (bot: undermine) 2016-10-05 10:55:23 [scra

我需要帮助。我想为一个特定的网站做一个爬虫程序(DewriteJournal)。我想从站点获取这些数据,为我创建一个控制台输出,因为我主要在控制台上工作,不想经常切换。另一点是我想将数据推送到数据库中(sql等没有问题)。但不知何故,我只是在尝试执行爬虫程序时显示了这一点,我认为教程并没有真正的帮助:

2016-10-05 10:55:23 [scrapy] INFO: Scrapy 1.0.3 started (bot: undermine)
2016-10-05 10:55:23 [scrapy] INFO: Optional features available: ssl, http11, boto
2016-10-05 10:55:23 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'undermine.spiders', 'SPIDER_MODULES': ['undermine.spiders'], 'BOT_NAME': 'undermine'}
2016-10-05 10:55:23 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, LogStats, CoreStats, SpiderState
2016-10-05 10:55:23 [boto] DEBUG: Retrieving credentials from metadata server.
2016-10-05 10:55:24 [boto] ERROR: Caught exception reading instance data
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/boto/utils.py", line 210, in retry_url
    r = opener.open(req, timeout=timeout)
  File "/usr/lib/python2.7/urllib2.py", line 429, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 447, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1228, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1198, in do_open
    raise URLError(err)
URLError: <urlopen error timed out>
2016-10-05 10:55:24 [boto] ERROR: Unable to read instance data, giving up
2016-10-05 10:55:24 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2016-10-05 10:55:24 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2016-10-05 10:55:24 [scrapy] INFO: Enabled item pipelines: 
2016-10-05 10:55:24 [scrapy] INFO: Spider opened
2016-10-05 10:55:24 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-10-05 10:55:24 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-10-05 10:55:24 [scrapy] ERROR: Error while obtaining start requests
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/scrapy/core/engine.py", line 110, in _next_request
    request = next(slot.start_requests)
  File "/usr/lib/python2.7/dist-packages/scrapy/spiders/__init__.py", line 70, in start_requests
    yield self.make_requests_from_url(url)
  File "/usr/lib/python2.7/dist-packages/scrapy/spiders/__init__.py", line 73, in make_requests_from_url
    return Request(url, dont_filter=True)
  File "/usr/lib/python2.7/dist-packages/scrapy/http/request/__init__.py", line 24, in __init__
    self._set_url(url)
  File "/usr/lib/python2.7/dist-packages/scrapy/http/request/__init__.py", line 59, in _set_url
    raise ValueError('Missing scheme in request url: %s' % self._url)
ValueError: Missing scheme in request url: theunderminejournal.com/#eu/eredar/item/124442
2016-10-05 10:55:24 [scrapy] INFO: Closing spider (finished)
2016-10-05 10:55:24 [scrapy] INFO: Dumping Scrapy stats:
{'finish_reason': 'finished',
 'finish_time': datetime.datetime(2016, 10, 5, 8, 55, 24, 710944),
 'log_count/DEBUG': 2,
 'log_count/ERROR': 3,
 'log_count/INFO': 7,
 'start_time': datetime.datetime(2016, 10, 5, 8, 55, 24, 704378)}
2016-10-05 10:55:24 [scrapy] INFO: Spider closed (finished)
有人知道一个暗示吗

编辑结果

2016-10-05 11:21:35 [scrapy] INFO: Scrapy 1.0.3 started (bot: undermine)
2016-10-05 11:21:35 [scrapy] INFO: Optional features available: ssl, http11, boto
2016-10-05 11:21:35 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'undermine.spiders', 'SPIDER_MODULES': ['undermine.spiders'], 'BOT_NAME': 'undermine'}
2016-10-05 11:21:35 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, LogStats, CoreStats, SpiderState
2016-10-05 11:21:35 [boto] DEBUG: Retrieving credentials from metadata server.
2016-10-05 11:21:36 [boto] ERROR: Caught exception reading instance data
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/boto/utils.py", line 210, in retry_url
    r = opener.open(req, timeout=timeout)
  File "/usr/lib/python2.7/urllib2.py", line 429, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 447, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1228, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1198, in do_open
    raise URLError(err)
URLError: <urlopen error timed out>
2016-10-05 11:21:36 [boto] ERROR: Unable to read instance data, giving up
2016-10-05 11:21:35[scrapy]信息:scrapy 1.0.3已启动(机器人:破坏)
2016-10-05 11:21:35[scrapy]信息:可选功能:ssl、http11、boto
2016-10-05 11:21:35[scrapy]信息:覆盖的设置:{'NEWSPIDER_模块':'Degrade.SPIDER','SPIDER_模块':['Degrade.SPIDER'],'BOT_NAME':'Degrade'}
2016-10-05 11:21:35[scrapy]信息:启用的扩展:CloseSpider、TelnetConsole、LogStats、CoreStats、SpiderState
2016-10-05 11:21:35[boto]调试:从元数据服务器检索凭据。
2016-10-05 11:21:36[boto]错误:读取实例数据时捕获异常
回溯(最近一次呼叫最后一次):
文件“/usr/lib/python2.7/dist packages/boto/utils.py”,第210行,在重试url中
r=打开器。打开(请求,超时=超时)
文件“/usr/lib/python2.7/urllib2.py”,第429行,打开
响应=自身打开(请求,数据)
文件“/usr/lib/python2.7/urllib2.py”,第447行,打开
"开放",
文件“/usr/lib/python2.7/urllib2.py”,第407行,在调用链中
结果=func(*args)
文件“/usr/lib/python2.7/urllib2.py”,第1228行,在http\u open中
返回self.do_open(httplib.HTTPConnection,req)
文件“/usr/lib/python2.7/urllib2.py”,第1198行,打开
引发URL错误(err)
URL错误:
2016-10-05 11:21:36[boto]错误:无法读取实例数据,放弃

值错误:请求url中缺少方案:theDeverbejournal.com/#eu/eredar/item/124442

您的URL应始终以
http://
https://
开头

start_urls = (
    'theunderminejournal.com/#eu/eredar/item/124442',
    # ^ should be:
    'http://theunderminejournal.com/#eu/eredar/item/124442',
)

编辑中的错误是完全不相关的,是由无法连接到某处的
boto
包引起的。你很可能会忽略它。蜘蛛本身能工作吗?
start_urls = (
    'theunderminejournal.com/#eu/eredar/item/124442',
    # ^ should be:
    'http://theunderminejournal.com/#eu/eredar/item/124442',
)