如何在Scrapy中按顺序生成多个请求？_Scrapy_Python Requests_Yield

如何在Scrapy中按顺序生成多个请求？

scrapy

如何在Scrapy中按顺序生成多个请求？,scrapy,python-requests,yield,Scrapy,Python Requests,Yield,我需要用Scrapy按顺序发送我的请求 def n1(self, response) : #self.input = [elem1,elem2,elem3,elem4,elem5, .... ,elem100000] for (elem,) in self.input : link = urljoin(path,elem) yield Request(link) 我的问题是请求不符合顺序。我读了，但没有正确答案我应该如何更改按顺序发送请

我需要用Scrapy按顺序发送我的请求

def n1(self, response) :
    #self.input = [elem1,elem2,elem3,elem4,elem5, .... ,elem100000]
    for (elem,) in self.input :

        link =  urljoin(path,elem)

        yield Request(link)

我的问题是请求不符合顺序。我读了，但没有正确答案

我应该如何更改按顺序发送请求的代码

更新1

我使用了优先级并将代码更改为

def n1(self, response) :

    #self.input = [elem1,elem2,elem3,elem4,elem5, .... ,elem100000]
    self.prio = len(self.input)
    for (elem,) in self.input :
        self.prio -= 1
        link =  urljoin(path,elem)

        yield Request(link, priority=self.prio)

我给这只蜘蛛的设置是

custom_settings = {
    'DOWNLOAD_DELAY' : 0,
    'COOKIES_ENABLED' : True,
    'CONCURRENT_REQUESTS' : 1 ,
    'AUTOTHROTTLE_ENABLED' : False,
}

现在顺序改变了，但它与数组中元素的顺序不同

我认为并发请求在这里起作用。您可以尝试设置

custom_settings = {
    'CONCURRENT_REQUESTS': 1
}

默认设置为8。它可以解释为什么当你有其他工人自由工作时，优先权不会得到尊重。

我认为并发请求在这里起作用。您可以尝试设置

custom_settings = {
    'CONCURRENT_REQUESTS': 1
}

默认设置为8。它可以解释为什么当你有其他工人可以自由工作时，优先权不会得到尊重。

只有在收到上一个请求后，你才能发送下一个请求：

class MainSpider(Spider):
    urls = [
        'https://www.url1...',
        'https://www.url2...',
        'https://www.url3...',
    ]

    def start_requests(self):
        yield Request(
            url=self.urls[0],
            callback=self.parse,
            meta={'next_index': 1},
        )

    def parse(self, response):
        next_index = response.meta['next_index']

        # do something with response...

        # Process next url
        if next_index < len(self.urls):
            yield Request(
                url=self.urls[next_index],
                callback=self.parse,
                meta={'next_index': next_index+1},
            )

类主十字轴（十字轴）：
URL=[
'https://www.url1...',
'https://www.url2...',
'https://www.url3...',
]
def start_请求（自我）：
让步请求(
url=self.url[0]，
callback=self.parse，
meta={'next_index'：1}，
)
def解析（自我，响应）：
next_index=response.meta['next_index']
#用回应做点什么。。。
#处理下一个url
如果下一个索引

只有在收到上一个请求后，才能发送下一个请求：

class MainSpider(Spider):
    urls = [
        'https://www.url1...',
        'https://www.url2...',
        'https://www.url3...',
    ]

    def start_requests(self):
        yield Request(
            url=self.urls[0],
            callback=self.parse,
            meta={'next_index': 1},
        )

    def parse(self, response):
        next_index = response.meta['next_index']

        # do something with response...

        # Process next url
        if next_index < len(self.urls):
            yield Request(
                url=self.urls[next_index],
                callback=self.parse,
                meta={'next_index': next_index+1},
            )

类主十字轴（十字轴）：
URL=[
'https://www.url1...',
'https://www.url2...',
'https://www.url3...',
]
def start_请求（自我）：
让步请求(
url=self.url[0]，
callback=self.parse，
meta={'next_index'：1}，
)
def解析（自我，响应）：
next_index=response.meta['next_index']
#用回应做点什么。。。
#处理下一个url
如果下一个索引

使用

return

语句而不是

yield

您甚至不需要触摸任何设置：

from scrapy.spiders import Spider, Request

class MySpider(Spider):

    name = 'toscrape.com'
    start_urls = ['http://books.toscrape.com/catalogue/page-1.html']

    urls = (
        'http://books.toscrape.com/catalogue/page-{}.html'.format(i + 1) for i in range(50)
    )

    def parse(self, response):
        for url in self.urls:
            return Request(url)

输出：

2018-11-20 03:35:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-1.html> (referer: http://books.toscrape.com/catalogue/page-1.html)
2018-11-20 03:35:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-2.html> (referer: http://books.toscrape.com/catalogue/page-1.html)
2018-11-20 03:35:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-3.html> (referer: http://books.toscrape.com/catalogue/page-2.html)
2018-11-20 03:35:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-4.html> (referer: http://books.toscrape.com/catalogue/page-3.html)
2018-11-20 03:35:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-5.html> (referer: http://books.toscrape.com/catalogue/page-4.html)
2018-11-20 03:35:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-6.html> (referer: http://books.toscrape.com/catalogue/page-5.html)
2018-11-20 03:35:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-7.html> (referer: http://books.toscrape.com/catalogue/page-6.html)
2018-11-20 03:35:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-8.html> (referer: http://books.toscrape.com/catalogue/page-7.html)
2018-11-20 03:35:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-9.html> (referer: http://books.toscrape.com/catalogue/page-8.html)
2018-11-20 03:35:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-10.html> (referer: http://books.toscrape.com/catalogue/page-9.html)
2018-11-20 03:35:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-11.html> (referer: http://books.toscrape.com/catalogue/page-10.html)
2018-11-20 03:35:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-12.html> (referer: http://books.toscrape.com/catalogue/page-11.html)
2018-11-20 03:35:48 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-13.html> (referer: http://books.toscrape.com/catalogue/page-12.html)
2018-11-20 03:35:48 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-14.html> (referer: http://books.toscrape.com/catalogue/page-13.html)
2018-11-20 03:35:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-15.html> (referer: http://books.toscrape.com/catalogue/page-14.html)
2018-11-20 03:35:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-16.html> (referer: http://books.toscrape.com/catalogue/page-15.html)
2018-11-20 03:35:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-17.html> (referer: http://books.toscrape.com/catalogue/page-16.html)
2018-11-20 03:35:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-18.html> (referer: http://books.toscrape.com/catalogue/page-17.html)
2018-11-20 03:35:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-19.html> (referer: http://books.toscrape.com/catalogue/page-18.html)
2018-11-20 03:35:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-20.html> (referer: http://books.toscrape.com/catalogue/page-19.html)
2018-11-20 03:35:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-21.html> (referer: http://books.toscrape.com/catalogue/page-20.html)
2018-11-20 03:35:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-22.html> (referer: http://books.toscrape.com/catalogue/page-21.html)
2018-11-20 03:35:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-23.html> (referer: http://books.toscrape.com/catalogue/page-22.html)
2018-11-20 03:35:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-24.html> (referer: http://books.toscrape.com/catalogue/page-23.html)
2018-11-20 03:35:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-25.html> (referer: http://books.toscrape.com/catalogue/page-24.html)

2018-11-20 03:35:43[scrapy.core.engine]调试：爬网（200）（参考：http://books.toscrape.com/catalogue/page-1.html)
2018-11-20 03:35:43[刮屑核心引擎]调试：爬网（200）（参考：http://books.toscrape.com/catalogue/page-1.html)
2018-11-20 03:35:44[刮屑核心引擎]调试：爬网（200）（参考：http://books.toscrape.com/catalogue/page-2.html)
2018-11-20 03:35:44[刮屑核心引擎]调试：爬网（200）（参考：http://books.toscrape.com/catalogue/page-3.html)
2018-11-20 03:35:45[刮屑核心引擎]调试：爬网（200）（参考：http://books.toscrape.com/catalogue/page-4.html)
2018-11-20 03:35:45[刮屑核心引擎]调试：爬网（200）（参考：http://books.toscrape.com/catalogue/page-5.html)
2018-11-20 03:35:45[刮屑核心引擎]调试：爬网（200）（参考：http://books.toscrape.com/catalogue/page-6.html)
2018-11-20 03:35:46[刮屑核心引擎]调试：爬网（200）（参考：http://books.toscrape.com/catalogue/page-7.html)
2018-11-20 03:35:46[刮屑核心引擎]调试：爬网（200）（参考：http://books.toscrape.com/catalogue/page-8.html)
2018-11-20 03:35:47[刮屑核心引擎]调试：爬网（200）（参考：http://books.toscrape.com/catalogue/page-9.html)
2018-11-20 03:35:47[刮屑核心引擎]调试：爬网（200）（参考：http://books.toscrape.com/catalogue/page-10.html)
2018-11-20 03:35:47[刮屑核心引擎]调试：爬网（200）（参考：http://books.toscrape.com/catalogue/page-11.html)
2018-11-20 03:35:48[刮屑核心引擎]调试：爬网（200）（参考：http://books.toscrape.com/catalogue/page-12.html)
2018-11-20 03:35:48[刮屑核心引擎]调试：爬网（200）（参考：http://books.toscrape.com/catalogue/page-13.html)
2018-11-20 03:35:49[刮屑核心引擎]调试：爬网（200）（参考：http://books.toscrape.com/catalogue/page-14.html)
2018-11-20 03:35:49[刮屑核心引擎]调试：爬网（200）（参考：http://books.toscrape.com/catalogue/page-15.html)
2018-11-20 03:35:50[刮屑核心引擎]调试：爬网（200）（参考：http://books.toscrape.com/catalogue/page-16.html)
2018-11-20 03:35:50[刮屑核心引擎]调试：爬网（200）（参考：http://books.toscrape.com/catalogue/page-17.html)
2018-11-20 03:35:50[刮屑核心引擎]调试：爬网（200）（参考：http://books.toscrape.com/catalogue/page-18.html)
2018-11-20 03:35:51[刮屑核心引擎]调试：爬网（200）（参考：http://books.toscrape.com/catalogue/page-19.html)
2018-11-20 03:35:51[刮屑核心引擎]调试：爬网（200）（参考：http://books.toscrape.com/catalogue/page-20.html)
2018-11-20 03:35:52[刮屑核心引擎]调试：爬网（200）（参考：http://books.toscrape.com/catalogue/page-21.html)
2018-11-20 03:35:52[刮屑核心引擎]调试：爬网（200）（参考：http://books.toscrape.com/catalogue/page-22.html)
2018-11-20 03:35:53[刮屑核心引擎]调试：爬网（200）（参考：http://books.toscrape.com/catalogue/page-23.html)
2018-11-20 03:35:53[刮屑核心引擎]调试：爬网（200）（参考：http://books.toscrape.com/catalogue/page-24.html)

使用

yield

语句，引擎从生成器获取所有响应并以任意顺序执行它们（我怀疑它们可能存储在某种集合中以删除重复项）。

使用

return

语句而不是

yield

您甚至不需要触摸任何设置：

from scrapy.spiders import Spider, Request

class MySpider(Spider):

    name = 'toscrape.com'
    start_urls = ['http://books.toscrape.com/catalogue/page-1.html']

    urls = (
        'http://books.toscrape.com/catalogue/page-{}.html'.format(i + 1) for i in range(50)
    )

    def parse(self, response):
        for url in self.urls:
            return Request(url)

输出：

2018-11-20 03:35:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-1.html> (referer: http://books.toscrape.com/catalogue/page-1.html)
2018-11-20 03:35:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-2.html> (referer: http://books.toscrape.com/catalogue/page-1.html)
2018-11-20 03:35:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-3.html> (referer: http://books.toscrape.com/catalogue/page-2.html)
2018-11-20 03:35:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-4.html> (referer: http://books.toscrape.com/catalogue/page-3.html)
2018-11-20 03:35:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-5.html> (referer: http://books.toscrape.com/catalogue/page-4.html)
2018-11-20 03:35:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-6.html> (referer: http://books.toscrape.com/catalogue/page-5.html)
2018-11-20 03:35:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-7.html> (referer: http://books.toscrape.com/catalogue/page-6.html)
2018-11-20 03:35:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-8.html> (referer: http://books.toscrape.com/catalogue/page-7.html)
2018-11-20 03:35:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-9.html> (referer: http://books.toscrape.com/catalogue/page-8.html)
2018-11-20 03:35:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-10.html> (referer: http://books.toscrape.com/catalogue/page-9.html)
2018-11-20 03:35:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-11.html> (referer: http://books.toscrape.com/catalogue/page-10.html)
2018-11-20 03:35:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-12.html> (referer: http://books.toscrape.com/catalogue/page-11.html)
2018-11-20 03:35:48 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-13.html> (referer: http://books.toscrape.com/catalogue/page-12.html)
2018-11-20 03:35:48 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-14.html> (referer: http://books.toscrape.com/catalogue/page-13.html)
2018-11-20 03:35:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-15.html> (referer: http://books.toscrape.com/catalogue/page-14.html)
2018-11-20 03:35:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-16.html> (referer: http://books.toscrape.com/catalogue/page-15.html)
2018-11-20 03:35:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-17.html> (referer: http://books.toscrape.com/catalogue/page-16.html)
2018-11-20 03:35:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-18.html> (referer: http://books.toscrape.com/catalogue/page-17.html)
2018-11-20 03:35:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-19.html> (referer: http://books.toscrape.com/catalogue/page-18.html)
2018-11-20 03:35:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-20.html> (referer: http://books.toscrape.com/catalogue/page-19.html)
2018-11-20 03:35:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-21.html> (referer: http://books.toscrape.com/catalogue/page-20.html)
2018-11-20 03:35:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-22.html> (referer: http://books.toscrape.com/catalogue/page-21.html)
2018-11-20 03:35:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-23.html> (referer: http://books.toscrape.com/catalogue/page-22.html)
2018-11-20 03:35:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-24.html> (referer: http://books.toscrape.com/catalogue/page-23.html)
2018-11-20 03:35:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://books.toscrape.com/catalogue/page-25.html> (referer: http://books.toscrape.com/catalogue/page-24.html)

2018-11-20 03:35:43[scrapy.core.eng