如何使用scrapy抓取多级页面？（两级）_Scrapy

如何使用scrapy抓取多级页面？（两级）

scrapy

如何使用scrapy抓取多级页面？（两级）,scrapy,Scrapy,在第一页，它很好地删除了title标签中的文本“test1”，但在第二页“test2.html”中没有任何内容我的剧本： from scrapy.spider import Spider from scrapy.selector import Selector from testscrapy1.items import Website class DmozSpider(Spider): name = "bill" allowed_domains = ["http:/

在第一页，它很好地删除了title标签中的文本“test1”，但在第二页“test2.html”中没有任何内容我的剧本：

from scrapy.spider import Spider

from scrapy.selector import Selector

from testscrapy1.items import Website

class DmozSpider(Spider):

     name = "bill"
     allowed_domains = ["http://www.mywebsite.com"]
     start_urls = [
         "http://www.mywebsite.com/test.html"]



def parse(self,response):

    for site in response.xpath('//head'):
        item = Website()
        item['title'] = site.xpath('//title/text()').extract()
        yield item

    yield scrapy.Request(url="www.mywebsite.com/test1.html", callback=self.other_function)

def other_function(self,response):

    for other_thing in response.xpath('//head'):
        item = Website()
        item['title'] = other_thing.xpath('//title/text()').extract()
        yield item

提前感谢您，请尝试

产生scrapy.Request（url=“www.mywebsite.com”，callback=self.other\u函数）

而不是

生成scrapy.Request（url=“www.mywebsite.com/test1.html”，callback=self.other_函数）

可能重复的

scrapy.Request（url=“www.mywebsite.com/test1.html”，callback=self.other_函数）

看起来很奇怪。您是否应该将

url

设置为类似

urlparse.urljoin（'www.mywebsite.com'，site.url）

（没有工作代码，只是一些示例）<代码>请求告诉scrapy下一步要访问哪些站点。如果将其设置为固定字符串，它将多次刮取同一站点。