Scrapy 使用未知总数分页时，如果达到404，则停止_Scrapy_Scrapy Spider

Scrapy 使用未知总数分页时，如果达到404，则停止

scrapy

Scrapy 使用未知总数分页时，如果达到404，则停止,scrapy,scrapy-spider,Scrapy,Scrapy Spider,我是新手，请原谅我的问题所以我有一个url），我希望scrapy检查所有计数（1,2,3,4,5），直到它到达一个空页面（没有html）或404页面我的问题是总计数未知，因此我不确定如何告诉scrapy这样工作： http://example.com/news?count=1 ===> found data, save it http://example.com/news?count=2 ===> found data, save it http://example.com/ne

我是新手，请原谅我的问题

所以我有一个url），我希望scrapy检查所有计数（1,2,3,4,5），直到它到达一个空页面（没有html）或404页面

我的问题是总计数未知，因此我不确定如何告诉scrapy这样工作：

http://example.com/news?count=1 ===> found data, save it
http://example.com/news?count=2 ===> found data, save it
http://example.com/news?count=3 ===> found data, save it
....
....
....
http://example.com/news?count=X ===> no data found, stop here.

只需编写蜘蛛代码即可：

class ExampleSpider(scrapy.Spider):
  name = "example"
  allowed_domains = ["example.com"]
  start_urls = ["http://example.com/news?count=1"]
  count = 1

  def parse(self, response):
    ... make your magic! ...
    self.count = self.count + 1
    next_url = response.url[:-1] + str(self.count)
    yield scrapy.Request(next_url, callback=self.parse)

显然，您必须改进

next\u url

中的逻辑，如果您想要

count>9

没有直接的方法告诉scrapy这样运行，每个url都应该有一个新的请求，请阅读我不希望得到这个答案：），谢谢您的反馈