Scrapy 解析结果中的链接的碎片列表
以下是我当前的代码:Scrapy 解析结果中的链接的碎片列表,scrapy,Scrapy,以下是我当前的代码: #scrap all the cafe links from example.com import scrapy, re from scrapy.linkextractors import LinkExtractor from scrapy.contrib.spiders import CrawlSpider, Rule from scrapy import Selector class DengaSpider(scrapy.Sp
#scrap all the cafe links from example.com
import scrapy, re
from scrapy.linkextractors import LinkExtractor
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy import Selector
class DengaSpider(scrapy.Spider):
name = 'cafes'
allowed_domains = ['example.com']
start_urls = [
'http://example.com/archives/8136.html',
]
cafeOnlyLink = []
def parse(self, response):
cafelink = response.xpath('//li/a[contains(@href, "archives")]/@href').extract()
twoHourRegex = re.compile(r'^http://example\.com/archives/\d+.html$')
cafeOnlyLink = [ s for s in cafelink if twoHourRegex.match(s) ]
那么,我应该如何继续解析[cafeOnlyLink]列表中包含的每个url的内容呢?我想将每个页面的所有结果保存在一个csv文件中 您可以使用以下内容:
for url in cafeOnlyLink:
yield scrapy.Request(url=url, callback=self.parse_save_to_csv)
def parse_save_to_csv(self, response):
# The content is in response.body, so you have to select what information
# you want to sent to the csv file.
非常感谢。我已经能够创建一个包含所有信息的json文件。