Scrapy 解析结果中的链接的碎片列表_Scrapy

Scrapy 解析结果中的链接的碎片列表

scrapy

Scrapy 解析结果中的链接的碎片列表,scrapy,Scrapy,以下是我当前的代码： #scrap all the cafe links from example.com import scrapy, re from scrapy.linkextractors import LinkExtractor from scrapy.contrib.spiders import CrawlSpider, Rule from scrapy import Selector class DengaSpider(scrapy.Sp

以下是我当前的代码：

#scrap all the cafe links from example.com

    import scrapy, re
    from scrapy.linkextractors import LinkExtractor
    from scrapy.contrib.spiders import CrawlSpider, Rule
    from scrapy import Selector

    class DengaSpider(scrapy.Spider):
        name = 'cafes'
        allowed_domains = ['example.com']
        start_urls = [
            'http://example.com/archives/8136.html',
        ]

        cafeOnlyLink = []

        def parse(self, response):
            cafelink = response.xpath('//li/a[contains(@href, "archives")]/@href').extract()
            twoHourRegex = re.compile(r'^http://example\.com/archives/\d+.html$')
            cafeOnlyLink = [ s for s in cafelink if twoHourRegex.match(s) ]

那么，我应该如何继续解析[cafeOnlyLink]列表中包含的每个url的内容呢？我想将每个页面的所有结果保存在一个csv文件中

您可以使用以下内容：

  for url in cafeOnlyLink:
    yield scrapy.Request(url=url, callback=self.parse_save_to_csv)

def parse_save_to_csv(self, response):
  # The content is in response.body, so you have to select what information
  # you want to sent to the csv file.

非常感谢。我已经能够创建一个包含所有信息的json文件。