Python 3.x 将所有分页链接提取到url为scrapy的页面=https://www.blablacar.in/ride-sharing/new-delhi/chandigarh/
有人能帮我提取所有分页链接到带有Python 3.x 将所有分页链接提取到url为scrapy的页面=https://www.blablacar.in/ride-sharing/new-delhi/chandigarh/,python-3.x,web-scraping,scrapy,web-crawler,Python 3.x,Web Scraping,Scrapy,Web Crawler,有人能帮我提取所有分页链接到带有url的scrapy页面吗=https://www.blablacar.in/ride-sharing/new-delhi/chandigarh/ 就像我用python试过的那样 但不了解细节 我的代码如下===================== allowed_domains = ['blablacar.in'] start_urls = ['https://www.blablacar.in/ride-sharing/new-delhi/chandig
url的scrapy页面吗=https://www.blablacar.in/ride-sharing/new-delhi/chandigarh/
就像我用python试过的那样
但不了解细节
我的代码如下=====================
allowed_domains = ['blablacar.in']
start_urls = ['https://www.blablacar.in/ride-sharing/new-delhi/chandigarh/']
def parse(self, response):
products = response.css('.trip-search-results li')
for p in products:
brand = p.css('.ProfileCard-info--name::text').extract_first().strip()
price = p.css('.description .time::attr(content)').extract_first()
item = ProductItem()
item['brand'] = brand
item['price'] = price
yield item
nextPageLinkSelector = response.css('.js-trip-search-pagination::attr(href)').extract_first()
if nextPageLinkSelector:
nextPageLink = nextPageLinkSelector
yield scrapy.Request(url=response.urljoin(nextPageLink), )
您只需找到下一页的链接,然后按照以下步骤操作:
def parse(self, response):
products = response.css('.trip-search-results li')
for p in products:
brand = p.css('.ProfileCard-info--name::text').extract_first().strip()
price = p.css('.description .time::attr(content)').extract_first()
item = ProductItem()
item['brand'] = brand
item['price'] = price
yield item
# Here is the pagination following.
for a_tag in response.css('.pagination .next:not(.disabled) a'):
yield response.follow(a_tag, self.parse_route)
请尝试以下操作以查看下一页链接:
nextPageLink = response.xpath("//*[@class='pagination']//*[@class='next' and not(contains(@class,'disabled'))]/a/@href").extract_first()
if nextPageLink:
yield response.follow(nextPageLink,callback=self.parse)
执行代码时,在response.css(“.trip search results li”)中获取trip的错误信息:^TabError:中制表符和空格的使用不一致indentation@rijin.p确保缩进是一致的:缩进应该只包含空格或制表符。你好像把这些混在一起了。