Python 3.x 不转到第二页通过scrapy提取数据
保存一页的数据,不转到第二页,不显示任何错误Python 3.x 不转到第二页通过scrapy提取数据,python-3.x,Python 3.x,保存一页的数据,不转到第二页,不显示任何错误 import scrapy from ..items import QoutetutorialItem class QouteSpider(scrapy.Spider): name = 'qoute' page_num =2; allowed_domains = ['http://quotes.toscrape.com'] start_urls = ['http:
import scrapy
from ..items import QoutetutorialItem
class QouteSpider(scrapy.Spider):
name = 'qoute'
page_num =2;
allowed_domains = ['http://quotes.toscrape.com']
start_urls = ['http://quotes.toscrape.com/page/1/']
def parse(self, response):
all_div_quote = response.css("div.quote")
items = QoutetutorialItem()
for x in all_div_quote:
title = x.css("span.text::text").extract();
author = x.css(".author::text").extract();
tag = x.css(".tag::text").extract();
items['title'] = title
items['author'] = author
items['tag'] = tag
yield items
next_page = 'http://quotes.toscrape.com/page/'+str(QouteSpider.page_num)+'/'
# if next_page is not None:
if QouteSpider.page_num <11:
QouteSpider.page_num+=1
yield response.follow(next_page , callback= self.parse)
import scrapy
从..项导入QOUTETURORIALITEM
等级QouteSpider(刮毛蜘蛛):
名称='qoute'
页码=2;
允许的_域=['http://quotes.toscrape.com']
起始URL=['http://quotes.toscrape.com/page/1/']
def解析(自我,响应):
all\u div\u quote=response.css(“div.quote”)
items=QoutetutorialItem()
对于所有部门报价中的x:
title=x.css(“span.text::text”).extract();
author=x.css(“.author::text”).extract();
tag=x.css(“.tag::text”).extract();
项目['title']=标题
项目['author']=作者
项目['tag']=tag
收益项目
下一页http://quotes.toscrape.com/page/“+str(QouteSpider.page_num)+”/”
#如果下一页不是“无”:
如果QouteSpider.page_num只需这样做。首先,从页面源获取下一个页面URL,因为它在那里,然后向该页面发出请求。这就是它的样子
next_page = response.css('.next ::attr(href)')
if next_page:
yield response.follow(next_page, callback=self.parse)
这将解决您的问题,现在您也不需要计算下一页的URL。allowed_domains=['实际上问题就在那里。当我从allowed_domains中删除http时,问题就解决了。谢谢您的支持,先生