scrapy spider是否同时从多个域下载?

scrapy spider是否同时从多个域下载?,scrapy,Scrapy,我正在尝试同时刮取2个域。我创造了这样一只蜘蛛: class TestSpider(CrawlSpider): name = 'test-spider' allowed_domains = [ 'domain-a.com', 'domain-b.com' ] start_urls = [ 'http://www.domain-a.com/index.html', 'http://www.domain-b.com/index.htm

我正在尝试同时刮取2个域。我创造了这样一只蜘蛛:

class TestSpider(CrawlSpider):

    name = 'test-spider'
    allowed_domains = [ 'domain-a.com', 'domain-b.com' ]
    start_urls = [ 'http://www.domain-a.com/index.html', 
                   'http://www.domain-b.com/index.html' ] 
    rules = (
        Rule(LinkExtractor(), follow=True, callback='parse_item'),
    )

    def parse_item(self, response):
        log.msg('parsing ' + response.url, log.DEBUG)
我希望在输出中看到“domain-a.com和domain-b.com”条目的混合,但我只看到日志中提到的domain-a。但是,如果我运行单独的爬行器/爬虫器,我确实会看到两个域同时被抓取(不是实际的代码,但说明了这一点):


谢谢

可能值得检查爬网顺序-深度优先(默认)可能有利于域a,请参见谢谢shane,我将查看此内容
def setup_crawler(url):
    spider = TestSpider(start_url=url)
    crawler = Crawler(get_project_settings())
    crawler.configure()
    crawler.signals.connect(reactor.stop(), signal=signals.spider_closed)
    crawler.crawl(spider)
    crawler.start()

setup_crawler('http://www.domain-a.com/index.html')
setup_crawler('http://www.domain-b.com/index.html')
log.start(loglevel=log.DEBUG)
reactor.run()