Ajax Scrapy POST请求加载更多按钮_Ajax_Post_Pagination_Scrapy

Ajax Scrapy POST请求加载更多按钮

ajax post pagination scrapy

Ajax Scrapy POST请求加载更多按钮,ajax,post,pagination,scrapy,Ajax,Post,Pagination,Scrapy,我在努力寻找产品名称和价格页面底部有一个load more按钮，我尝试使用postman修改表单数据，'productBeginIndex'：和'resultsPerPage'：似乎修改了显示的产品数量但是，我不确定我的代码出了什么问题-无论我如何调整值，它仍然返回24个产品。我尝试过使用FormRequest.from_response（），但它仍然只返回24个产品 import scrapy class PriceSpider(scrapy.Spider): name = "

我在努力寻找产品名称和价格

页面底部有一个load more按钮，我尝试使用postman修改表单数据，

'productBeginIndex'：

和

'resultsPerPage'：

似乎修改了显示的产品数量

但是，我不确定我的代码出了什么问题-无论我如何调整值，它仍然返回24个产品。我尝试过使用

FormRequest.from_response（）

，但它仍然只返回24个产品

import scrapy


class PriceSpider(scrapy.Spider):
    name = "products"
    def parse(self, response):
        return [scrapy.FormRequest(url="https://www.fairprice.com.sg/baby-child",
                                   method='POST',
                                   formdata= {'productBeginIndex': '1', 'resultsPerPage': '1', },
                                   callback=self.logged_in)]

    def logged_in(self, response):
        # here you would extract links to follow and return Requests for
        # each of them, with another callback
      name = response.css("img::attr(title)").extract()
      price = response.css(".pdt_C_price::text").extract()

      for item in zip(name, price):
          scraped_info = {
                  "title" : item[0],
                  "value" : item[1]
                   }
          yield scraped_info

有人能告诉我我错过了什么吗？我如何实现一个循环来提取类别中的所有对象

非常感谢你

您应该发布到（get方法也可以）

/ProductListingView

而不是

/baby child

要清除所有对象，请修改循环中的参数

beginIndex

，并生成一个新请求。（顺便说一下，修改

productBeginIndex

将不起作用）

我们不知道产品的总数，所以安全的方法是每次爬行一组产品。通过修改

自定义设置

，您可以轻松控制从何处开始以及刮取多少产品

关于如何输出到

CSV

格式文件，请参阅

为了方便起见，我在下面添加了

PriceItem

类，您可以将其添加到

items.py

。使用命令

scrapy crawl PriceSpider-t csv-o test.csv

，您将得到一个

test.cvs

文件。或者，你可以试试

非常感谢@Vic！我只是想知道：1。因此，我必须先使用start\u请求，而不是Form.Request？2.对于formdata，我必须像您一样填写所有字段，还是只填写beginIndex和resultsPerPage就行了？3.您如何发现它是beginIndex而不是productBeginIndex？我查看了网络响应（表单数据），它被列为productBeginIndex。。。4.如何将数据输出为csv文件？我在代码的最后一行添加了一个收益率，csv文件变成空的。再次，非常感谢你，非常感谢你的帮助@奇夫1。根据需要，您应该实现

start\u请求

或定义

start\u URL

属性。我更喜欢前者，因为它很容易控制要刮哪些页面。2.只有填充

beginIndex

和

resultsPerPage

可能有效。（我刚才试过，有时有效，有时无效）3。我注意到

beginIndex

与

productBeginIndex

相同，所以我尝试修改

beginIndex

，结果成功了。4.我更新了我的答案。

# OUTPUTS
# 2018-08-15 16:00:08 [PriceSpider] INFO: ['Nestle Nan Optipro Gro Growing Up Milk Formula -Stage 3', 'Friso Gold Growing Up Milk Formula - Stage 3']
# 2018-08-15 16:00:08 [PriceSpider] INFO: ['\n\t\t\t\t\t$199.50\n\t\t\t\t', '\n\t\t\t\t\t$79.00\n\t\t\t\t']
# 2018-08-15 16:00:08 [PriceSpider] INFO: ['Aptamil Gold+ Toddler Growing Up Milk Formula - Stage 3', 'Aptamil Gold+ Junior Growing Up Milk Formula - Stage 4']
# 2018-08-15 16:00:08 [PriceSpider] INFO: ['\n\t\t\t\t\t$207.00\n\t\t\t\t', '\n\t\t\t\t\t$180.00\n\t\t\t\t']
#
# \n and \t is not a big deal, just strip() it

import scrapy

class PriceItem(scrapy.Item):
  title = scrapy.Field()
  value = scrapy.Field()

class PriceSpider(scrapy.Spider):
  name = "PriceSpider"

  custom_settings = {
    "BEGIN_PAGE" : 0,
    "END_PAGE" : 2,
    "RESULTS_PER_PAGE" : 2,
  }

  def start_requests(self): 

    formdata = {
      "sType" : "SimpleSearch",
      "ddkey" : "ProductListingView_6_-2011_3074457345618269512",
      "ajaxStoreImageDir" : "%2Fwcsstore%2FFairpriceStorefrontAssetStore%2F",
      "categoryId" : "3074457345616686371",
      "emsName" : "Widget_CatalogEntryList_701_3074457345618269512",
      "beginIndex" : "0",
      "resultsPerPage" : str(self.custom_settings["RESULTS_PER_PAGE"]),
      "disableProductCompare" : "false",
      "catalogId" : "10201",
      "langId" : "-1",
      "enableSKUListView" : "false",
      "storeId" : "10151",
    }

    # loop to scrape different pages
    for i in range(self.custom_settings["BEGIN_PAGE"], self.custom_settings["END_PAGE"]):
      formdata["beginIndex"] = str(self.custom_settings["RESULTS_PER_PAGE"] * i)

      yield scrapy.FormRequest(
        url="https://www.fairprice.com.sg/ProductListingView",
        formdata = formdata,
        callback=self.logged_in
      )

  def logged_in(self, response):
      name = response.css("img::attr(title)").extract()
      price = response.css(".pdt_C_price::text").extract()

      self.logger.info(name)
      self.logger.info(price)

      # Output to CSV: refer to https://stackoverflow.com/questions/29943075/scrapy-pipeline-to-export-csv-file-in-the-right-format
      # 
      for item in zip(name, price):
        yield PriceItem(
          title = item[0].strip(),
          value = item[1].strip()
        )