Python 分页在基本webscraper上不起作用 import scrapy 类蜘蛛(刮毛蜘蛛): 名称=‘最佳图书’ 页码=2 允许的\u域=[ “www.goodreads.com/list/show/1.有史以来最好的书?page=1”] 起始URL=[ 'https://www.goodreads.com/list/show/1.Best_Books_Ever?page=1'] def解析(自我,响应): 页码=2 对于response.xpath('//tr')中的书籍: 屈服{ “Title”:books.css('a.bookTitle span::text').get(), “Author”:books.css('a.authorName*::text').get(), “评级”:books.css('span.minigrating::text').get(), } #此部分不工作,无法阅读第1页 下一页https://www.goodreads.com/list/show/1.Best_Books_Ever?page=' + \ str(BestBooksSpider.页码) 如果BestBooksSpider.page_num
第一页很好,但它不会阅读后续页面。我尝试了其他教程中许多不同的代码变体,但都没有成功。我在scrapy中没有收到任何错误代码。Scrapy刚刚表示它已完成。您的Python 分页在基本webscraper上不起作用 import scrapy 类蜘蛛(刮毛蜘蛛): 名称=‘最佳图书’ 页码=2 允许的\u域=[ “www.goodreads.com/list/show/1.有史以来最好的书?page=1”] 起始URL=[ 'https://www.goodreads.com/list/show/1.Best_Books_Ever?page=1'] def解析(自我,响应): 页码=2 对于response.xpath('//tr')中的书籍: 屈服{ “Title”:books.css('a.bookTitle span::text').get(), “Author”:books.css('a.authorName*::text').get(), “评级”:books.css('span.minigrating::text').get(), } #此部分不工作,无法阅读第1页 下一页https://www.goodreads.com/list/show/1.Best_Books_Ever?page=' + \ str(BestBooksSpider.页码) 如果BestBooksSpider.page_num,python,pagination,scrapy,Python,Pagination,Scrapy,第一页很好,但它不会阅读后续页面。我尝试了其他教程中许多不同的代码变体,但都没有成功。我在scrapy中没有收到任何错误代码。Scrapy刚刚表示它已完成。您的允许的\u域看起来确实可能是分页不起作用的原因。 允许的\u域=['www.goodreads.com/list/show/1.有史以来最好的\u书籍\u?page=1']应该限制您的刮板仅限于第一页,因此请继续删除这一行,然后再次尝试您的爬行器。日志怎么说?您的允许的\u域启动错误… import scrapy class Best
允许的\u域
看起来确实可能是分页不起作用的原因。允许的\u域=['www.goodreads.com/list/show/1.有史以来最好的\u书籍\u?page=1']
应该限制您的刮板仅限于第一页,因此请继续删除这一行,然后再次尝试您的爬行器。日志怎么说?您的允许的\u域
启动错误…
import scrapy
class BestBooksSpider(scrapy.Spider):
name = 'best_books'
page_num = 2
allowed_domains = [
'www.goodreads.com/list/show/1.Best_Books_Ever?page=1']
start_urls = [
'https://www.goodreads.com/list/show/1.Best_Books_Ever?page=1']
def parse(self, response):
page_num = 2
for books in response.xpath('//tr'):
yield {
'Title': books.css('a.bookTitle span::text').get(),
'Author': books.css('a.authorName *::text').get(),
'Rating': books.css('span.minirating::text').get(),
}
# this part is not working, won't read past page 1
next_page = 'https://www.goodreads.com/list/show/1.Best_Books_Ever?page=' + \
str(BestBooksSpider.page_num)
if BestBooksSpider.page_num < 3:
BestBooksSpider.page_num += 1
yield response.follow(next_page, callback=self.parse)