Scrapy Spider在设置回调后不会递归调用自身_Scrapy_Scrapy Spider

Scrapy Spider在设置回调后不会递归调用自身

scrapy

Scrapy Spider在设置回调后不会递归调用自身,scrapy,scrapy-spider,Scrapy,Scrapy Spider,我项目的目标是在网站上搜索公司电话号码我正试图解析一个网页和一个电话号码的正则表达式（我有这部分工作），然后在页面上寻找链接。这些链接就是我想要递归调用的。所以我会调用这些链接上的函数并重复但是，它只运行该函数一次。见下面的代码： def parse(self, response): # The main method of the spider. It scrapes the URL(s) specified in the # 'start_url' argument abo

我项目的目标是在网站上搜索公司电话号码

我正试图解析一个网页和一个电话号码的正则表达式（我有这部分工作），然后在页面上寻找链接。这些链接就是我想要递归调用的。所以我会调用这些链接上的函数并重复但是，它只运行该函数一次。见下面的代码：

def parse(self, response):
    # The main method of the spider. It scrapes the URL(s) specified in the
    # 'start_url' argument above. The content of the scraped URL is passed on
    # as the 'response' object.

    hxs = HtmlXPathSelector(response)

    #print(phone_detail)
    print('here')
    for phone_num in response.xpath('//body').re(r'\d{3}.\d{3}.\d{4}'):
        item = PhoneNumItem()
        item['label'] = "a"
        item['phone_num'] = phone_num
        yield item

    for url in hxs.xpath('//a/@href').extract():
        # This loops through all the URLs found 
        # Constructs an absolute URL by combining the responses URL with a possible relative URL:
        next_page = response.urljoin(url)
        print("Found URL: " + next_page)

        #yield response.follow(next_page, self.parse_page)
        yield scrapy.Request(next_page, callback=self.parse)

请让我知道您的想法……对我来说，这段代码似乎应该可以工作，但事实并非如此。

有两件事我要先仔细检查：1。对于您的起始URL，您是返回“ing”还是放弃“ing”您提出的第一个请求？如果您在spider类中使用默认的“start_url”，则可以忽略此项。2.您是否绝对确定parse（）方法的后半部分正在页面上查找其他要爬网的链接？因为如果没有，您的方法将只运行一次。请张贴你的整个蜘蛛类，这样我可以帮助你更多。我觉得你的方法很好。然而，Scrapy是一个框架；您的爬行器类中的其他设置可能导致此问题。我们需要更多信息，您正在爬行的url是什么？你能把爬网日志贴出来吗？您可以通过

scrapy crawl spider--logfile output.log

或

scrapy crawl spider 2>1 | tee output.log

命令执行此操作（后者将输出放入屏幕和文件）。