无法使用scrapy和python中的javascript检索下一页链接
我在获取python的下一页链接时遇到了一个问题 代码无法使用scrapy和python中的javascript检索下一页链接,javascript,python,pagination,scrapy,Javascript,Python,Pagination,Scrapy,我在获取python的下一页链接时遇到了一个问题 代码 import scrapy from scrapy.http import Request from gharbheti.items import GharbhetiItem from scrapy.contrib.loader import ItemLoader from scrapy.contrib.loader.processor import TakeFirst, Identity, MapCompose, Join, Compos
import scrapy
from scrapy.http import Request
from gharbheti.items import GharbhetiItem
from scrapy.contrib.loader import ItemLoader
from scrapy.contrib.loader.processor import TakeFirst, Identity, MapCompose, Join, Compose
from urllib.parse import urljoin
class ListSpider(scrapy.Spider):
name = 'list'
allowed_domains = ['gharbheti.com']
start_urls = ['https://www.gharbheti.com/sale','https://www.gharbheti.com/rent']
def parse(self, response):
properties=response.xpath('//li[@class="col-md-6 Search_building"]/descendant::a')
for property in properties:
link=property.xpath('./@href').extract_first()
urls=response.urljoin(link)
yield Request(urls,callback=self.parse_property, meta={'URL':urls, })
def parse_property(self, response):
l = ItemLoader(item=GharbhetiItem(), response=response)
URL=response.meta.get('URL')
l.add_value('URL', response.url)
l.add_xpath('Title','//div[@class="product-page-meta"]/h4/em/text()',MapCompose(str.strip,str.title))
l.add_xpath('Offering','//figcaption[contains(text(), "For Sale")]/text()|//figcaption[contains(text(),"For Rent")]/text()',MapCompose(lambda i:i.replace('For',''),str.strip))
l.add_xpath('Price','//div[@class="deal-pricebox"]/descendant::h3/text()',MapCompose(str.strip))
l.add_xpath('Type','//ul[@class="suitable-for"]/li/text()',MapCompose(str.strip))
bike_parking=response.xpath('//i[@class="fa fa-motorcycle"]/following-sibling::em/text()').extract_first()
car_parking=response.xpath('//i[@class="fa fa-car"]/following-sibling::em/text()').extract_first()
parking=("Bike Parking: {} Car Parking: {}".format(bike_parking,car_parking))
l.add_value('Parking',parking)
l.add_xpath('Description','//div[@class="comment more"]/text()',MapCompose(str.strip))
l.add_xpath('Bedroom','//i[@class="fa fa-bed"]/following-sibling::text()',MapCompose(lambda i:i.replace('Total Bed Room:',''),str.strip,int))
l.add_xpath('Livingroom','//i[@class="fa fa-inbox"]/following-sibling::text()',MapCompose(lambda i:i.replace('Total Living Room:',''),str.strip,int))
l.add_xpath('Kitchen','//i[@class="fa fa-cutlery"]/following-sibling::text()',MapCompose(lambda i:i.replace('Total kitchen Room:',''),str.strip,int))
l.add_xpath('Bathroom','//i[@class="fa fa-puzzle-piece"]/following-sibling::text()',MapCompose(lambda i:i.replace('Total Toilet/Bathroom:',''),str.strip,int))
l.add_xpath('Address','//b[contains(text(), "Map")]/text()',MapCompose(lambda i:i.replace('Map Loaction :-',''),str.strip))
l.add_xpath('Features','//div[@class="list main-list"]/ul/li/text()',MapCompose(str.strip))
images=response.xpath('//div[@class="carousel-inner dtl-carousel-inner text-center"]/descendant::img').extract()
images=[s.replace('<img src="', '') for s in images]
images=[i.split('?')[0] for i in images]
Image=["http://www.gharbheti.com" + im for im in images]
l.add_value('Images',Image)
return l.load_item()
因为分页使用javascript,所以页面的源代码中没有链接 要了解发生了什么:
https://www.gharbheti.com/RoomRentHome/GetPropertiesForRent
,表单数据有两个值:
RentTypeId
:0{不确定这是什么,但如果你需要知道,我相信你能找到它}页面
:1{每次单击“加载更多内容”都会递增}for i in range(1,101):
<send a form request with i as the page value>
范围(1101)内的i的:
我假设文章返回的数据格式与网站主页不同,因此您可能需要定义另一个回调函数来解析该数据。欢迎使用StackOverflow。你的问题不太符合StackOverflow期望的标准。您当前的问题可能不会被接受。我强烈建议您按照“谢谢”的指导原则编辑您的问题。因此,我知道应该有另一个解析函数,并根据表单请求使用XPath。我应该遵循的任何最佳实践都是很好的。@shovanrai是的,一旦您成功地从表单请求获得响应,您将知道您必须执行哪种解析。对于这些步骤,我建议使用
scrapy shell
。
for i in range(1,101):
<send a form request with i as the page value>