开始时的lambda比起来是scrapy
有人能告诉我为什么parse()中的索引变量的数量始终是10013吗开始时的lambda比起来是scrapy,scrapy,python-3.5,Scrapy,Python 3.5,有人能告诉我为什么parse()中的索引变量的数量始终是10013吗 class GetsourcesSpider(scrapy.Spider): name = 'getSources' allowed_domains = ['bizhi.feihuo.com'] base_url = 'http://bizhi.feihuo.com/wallpaper/share?rsid={index}/' def start_requests(self): for index in range(
class GetsourcesSpider(scrapy.Spider):
name = 'getSources'
allowed_domains = ['bizhi.feihuo.com']
base_url = 'http://bizhi.feihuo.com/wallpaper/share?rsid={index}/'
def start_requests(self):
for index in range(10010, 10014):#11886
yield scrapy.Request(url=self.base_url.format(index=index), callback=lambda response:self.parse(response,index))
def parse(self, response, index):
video_label = response.xpath('//video')[0]
item = DynamicdesktopItem()
item['index'] = index # response.url[-6:-1]
item['video'] = video_label.attrib['src']
item['image'] = video_label.attrib['poster']
yield item
这是因为您给出的是
索引
变量引用,而不是值,这就是您得到最后一个值的原因。您需要使用meta
对象进行相同的操作。请参阅下面的更新代码
class GetsourcesSpider(scrapy.Spider):
name = 'getSources'
allowed_domains = ['bizhi.feihuo.com']
base_url = 'http://bizhi.feihuo.com/wallpaper/share?rsid={index}/'
def start_requests(self):
for index in range(10010, 10014):#11886
yield scrapy.Request(url=self.base_url.format(index=index), callback=self.parse, meta = {'index': index})
def parse(self, response):
index = response.meta['index']
video_label = response.xpath('//video')[0]
item = DynamicdesktopItem()
item['index'] = index # response.url[-6:-1]
item['video'] = video_label.attrib['src']
item['image'] = video_label.attrib['poster']
yield item
因为从所有lambda引用的
索引
变量不会复制到它们的局部范围。它在下一次循环迭代中被重写。
考虑这个片段:
lambdas = []
for i in range(3):
lambdas.append(lambda: print(i))
for fn in lambdas:
fn()
这将打印三个2,即i
的最后一个值
您应该利用请求类的meta=
关键字,而不是执行lambda回调: