Web scraping Q:scrapy redis没有';I don’我一页也不刮,一秒钟就写完了

Web scraping Q:scrapy redis没有';I don’我一页也不刮,一秒钟就写完了,web-scraping,scrapy,scrapy-spider,Web Scraping,Scrapy,Scrapy Spider,我的蜘蛛不刮页面,不到一秒钟就完成了,但不会抛出错误 我已经检查了代码,并与几周前成功运行的另一个类似项目进行了比较,但仍然无法找出问题所在 我使用的是Scrapy1.0.1和ScrapyRedis0.6 以下是日志: 2015-07-21 11:33:20 [scrapy] INFO: Scrapy 1.0.1 started (bot: demo) 2015-07-21 11:33:20 [scrapy] INFO: Optional features available: ssl, ht

我的蜘蛛不刮页面,不到一秒钟就完成了,但不会抛出错误

我已经检查了代码,并与几周前成功运行的另一个类似项目进行了比较,但仍然无法找出问题所在

我使用的是Scrapy1.0.1和ScrapyRedis0.6

以下是日志:

 2015-07-21 11:33:20 [scrapy] INFO: Scrapy 1.0.1 started (bot: demo)
2015-07-21 11:33:20 [scrapy] INFO: Optional features available: ssl, http11
2015-07-21 11:33:20 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'demo.spiders', 'LOG_LEVEL': 'INFO', 'SPIDER_MODULES': ['demo.spiders'], 'RETRY_HTTP_CODES': [500, 502, 503, 504, 400, 408, 404, 302, 403], 'BOT_NAME': 'demo', 'SCHEDULER': 'scrapy_redis.scheduler.Scheduler', 'DEFAULT_ITEM_CLASS': 'demo.items.DemoItem', 'REDIRECT_ENABLED': False}
2015-07-21 11:33:20 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, LogStats, CoreStats, SpiderState
2015-07-21 11:33:20 [scrapy] INFO: Enabled downloader middlewares: CustomUserAgentMiddleware, CustomHttpProxyMiddleware, HttpAuthMiddleware, DownloadTimeoutMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2015-07-21 11:33:20 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2015-07-21 11:33:20 [scrapy] INFO: Enabled item pipelines: RedisPipeline, DemoPipeline
2015-07-21 11:33:20 [scrapy] INFO: Spider opened
2015-07-21 11:33:20 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2015-07-21 11:33:20 [scrapy] INFO: Closing spider (finished)
2015-07-21 11:33:20 [scrapy] INFO: Dumping Scrapy stats:
{'finish_reason': 'finished',
 'finish_time': datetime.datetime(2015, 7, 21, 3, 33, 20, 301371),
 'log_count/INFO': 7,
 'start_time': datetime.datetime(2015, 7, 21, 3, 33, 20, 296941)}
2015-07-21 11:33:20 [scrapy] INFO: Spider closed (finished)
这是蜘蛛

# -*- coding: utf-8 -*-
import scrapy 
from demo.items import DemoItem
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from scrapy_redis.spiders import RedisMixin
from pip._vendor.requests.models import Request 

class DemoCrawler(RedisMixin, CrawlSpider):
name = "demo"
redis_key = "demoCrawler:start_urls"
rules = (              Rule(LinkExtractor(allow='/shop/\d+?/$',restrict_xpaths=u"//ul/li/div[@class='txt']/div[@class='tit']/a"),callback = 'parse_demo'),
         Rule(LinkExtractor(restrict_xpaths = u"//div[@class='shop-wrap']/div[@class='page']/a[@class='next']"),follow = True)
         )

def parse_demo(self,response):

    item = DemoItem()

    temp = response.xpath(u"//div[@id='basic-info']/div[@class='action']/a/@href").re("\d.+\d")
    item['id'] = temp[0] if temp else ''

    temp = response.xpath(u"//div[@class='page-header']/div[@class='container']/a[@class='city J-city']/text()").extract()
    item['city'] = temp[0] if temp else ''

    temp = response.xpath(u"//div[@class='breadcrumb']/span/text()").extract()
    item['name'] = temp[0] if temp else ''

    temp = response.xpath(u"//div[@class='main']/div[@id='sales']/text()").extract()
    item['deals'] = temp[0] if temp else ''

    temp = response.xpath(u"//div[@class='main-nav']/div[@class='container']/a[1]/text()").extract()
    item['category'] = temp[0] if temp else ''

    temp = response.xpath(u"//div[@class='main']/div[@id='basic-info']/div[@class='expand-info address']/a/span/text()").extract()
    item['region'] = temp[0] if temp else ''

    temp = response.xpath(u"//div[@class='main']/div[@id='basic-info']/div[@class='expand-info address']/span/text()").extract()
    item['address'] = temp[0] if temp else ''

    yield item
要启动spider,我应该在shell中键入两个命令:

redis cli lpush demoCrawler:start\u url

刮痧爬行演示


url是我正在抓取的特定url,对于exmaple

您是如何开始抓取的?您正在运行什么蜘蛛(显示代码)?谢谢。@alecxe我已经添加了爬行器和用于开始爬行的命令。谢谢