Python Scrapy允许\u damins消耗_Python_Scrapy Spider

Python Scrapy允许\u damins消耗

python

Python Scrapy允许\u damins消耗,python,scrapy-spider,Python,Scrapy Spider,我正在学习编写简单代码，获取堆栈溢出问题的帖子信息我设置了允许的\u域=[”http://stackoverflow.com/questions/]带底座十字轴。其parse（）方法只返回url为format的请求。“http://stackoverflow.com/questions/%d/%no 我想它会有用的…也许我对允许的域名有误解。 parse（）返回的所有请求似乎都已被允许的\u域过滤。它只在我删除允许的\u域时起作用。你能解释一下吗？对不起，我的问题很琐碎 class St

我正在学习编写简单代码，获取堆栈溢出问题的帖子信息

我设置了

允许的\u域=[”http://stackoverflow.com/questions/]

带底座十字轴。其parse（）方法只返回url为format的请求。

“http://stackoverflow.com/questions/%d/%no

我想它会有用的…也许我对允许的域名有误解。 parse（）返回的所有请求似乎都已被允许的\u域过滤。它只在我删除允许的\u域时起作用。你能解释一下吗？对不起，我的问题很琐碎

class StackOverFlowPost(scrapy.Spider):
    startNo = 26200877
    endNo = 26200880
    curNo = 26200877
    name = "stackOverFlowPost"
    start_urls = ["http://stackoverflow.com/questions/%d/" % startNo ]
    allowed_domains = ["http://stackoverflow.com/questions"]
    baseUrl = "http://stackoverflow.com/questions/%d/"

    def parse(self, response):
        itemObj = items.StackOverFlowItem()

        # getting items information from the page
        ...
        yield itemObj

        StackOverFlowPost.curNo += 1
        nextPost = StackOverFlowPost.baseUrl % StackOverFlowPost.curNo  

        yield scrapy.Request(nextPost, callback = self.parse)

在spider中，

允许的\u域

应该是

域

的列表（而不是

url

）：

请注意，您还可以使用

url

列表设置

start\u url

：

allowed_domains = ["stackoverflow.com"]

start_urls = ["http://stackoverflow.com/questions/%d/" % i for i in range(startNo, endNo+1)]

它使

parse（）

易于编写。

什么意思？允许的\u域已经是一个列表。但是正如您所说，如果我将其更改为

“stackoverflow.com”

，它可以工作，我是否应该删除

“http”

和

“/question”

？为什么？抱歉，您可以解释更多吗？