Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 不转到第二页通过scrapy提取数据_Python 3.x - Fatal编程技术网

Python 3.x 不转到第二页通过scrapy提取数据

Python 3.x 不转到第二页通过scrapy提取数据,python-3.x,Python 3.x,保存一页的数据,不转到第二页,不显示任何错误 import scrapy from ..items import QoutetutorialItem class QouteSpider(scrapy.Spider): name = 'qoute' page_num =2; allowed_domains = ['http://quotes.toscrape.com'] start_urls = ['http:

保存一页的数据,不转到第二页,不显示任何错误

    import scrapy
    from ..items import QoutetutorialItem
    class QouteSpider(scrapy.Spider):
        name = 'qoute'
        page_num =2;
        allowed_domains = ['http://quotes.toscrape.com']
        start_urls = ['http://quotes.toscrape.com/page/1/']

        def parse(self, response):
            all_div_quote = response.css("div.quote")
            items = QoutetutorialItem()
            for x in all_div_quote:
                title = x.css("span.text::text").extract();
                author = x.css(".author::text").extract();
                tag = x.css(".tag::text").extract();
                items['title'] = title
                items['author'] = author
                items['tag'] = tag
                yield items
            next_page = 'http://quotes.toscrape.com/page/'+str(QouteSpider.page_num)+'/'

            # if next_page is not None:
            if QouteSpider.page_num <11:
                QouteSpider.page_num+=1
                yield response.follow(next_page  , callback= self.parse)
import scrapy
从..项导入QOUTETURORIALITEM
等级QouteSpider(刮毛蜘蛛):
名称='qoute'
页码=2;
允许的_域=['http://quotes.toscrape.com']
起始URL=['http://quotes.toscrape.com/page/1/']
def解析(自我,响应):
all\u div\u quote=response.css(“div.quote”)
items=QoutetutorialItem()
对于所有部门报价中的x:
title=x.css(“span.text::text”).extract();
author=x.css(“.author::text”).extract();
tag=x.css(“.tag::text”).extract();
项目['title']=标题
项目['author']=作者
项目['tag']=tag
收益项目
下一页http://quotes.toscrape.com/page/“+str(QouteSpider.page_num)+”/”
#如果下一页不是“无”:

如果QouteSpider.page_num只需这样做。首先,从页面源获取下一个页面URL,因为它在那里,然后向该页面发出请求。这就是它的样子

next_page = response.css('.next ::attr(href)')

if next_page:  
   yield response.follow(next_page, callback=self.parse)
这将解决您的问题,现在您也不需要计算下一页的URL。

allowed_domains=['实际上问题就在那里。当我从allowed_domains中删除http时,问题就解决了。谢谢您的支持,先生