Python 使用scrapy从无限滚动页面中刮取数据？_Python_Ajax_Web Scraping_Scrapy

Python 使用scrapy从无限滚动页面中刮取数据？

python ajax web-scraping scrapy

Python 使用scrapy从无限滚动页面中刮取数据？,python,ajax,web-scraping,scrapy,Python,Ajax,Web Scraping,Scrapy,向下滚动时的响应url为：响应数据在ajax中的格式如下： {"page_var":"<div id=\"page_variables................ 如何在向下滚动页面后刮取加载的数据？此外，数据是ajax格式的，而不是json格式的。谢谢您可以通过两种方式接近：- 1.使用像Selenium这样的无头浏览器，或者如果您在Scrapy中工作，那么您也可以尝试Splash，它允许您通过Scrapy运行js函数。 2.只需将页面滚

向下滚动时的响应url为：

响应数据在ajax中的格式如下：

{"page_var":"<div id=\"page_variables................

如何在向下滚动页面后刮取加载的数据？此外，数据是ajax格式的，而不是json格式的。谢谢

您可以通过两种方式接近：- 1.使用像Selenium这样的无头浏览器，或者如果您在Scrapy中工作，那么您也可以尝试Splash，它允许您通过Scrapy运行js函数。 2.只需将页面滚动到要删除数据的位置，将该页面下载为HTML，然后运行普通代码

第二种方法是少量手动操作，但如果您只想删除几页，我建议您只使用后面的方法。

查找触发滚动时采取的操作（可能是xhr请求），并模拟这样做…？使用无头浏览器（如selenium和）对不起，我是新手，所以你知道任何可以理解的文档或博客链接吗？

import scrapy


class DummymartSpider(scrapy.Spider):
    name = 'dummymart'
    allowed_domains = ['dir.dummymart.com']
    start_urls = ['https://dir.dummymart.com/impcat/industrial-machinery.html',
            
                ]

    def parse(self, response):
        Company = response.xpath('//*[@class="lcname"]/text()').extract()
        product = response.xpath('//*[@class="pnm ldf cur"]/text()').extract()
        address = response.xpath('//*[@class="clg"]/text()').extract()
        phone = response.xpath('//*[@class="ls_co phn bo"]/text()').extract()

        for item in zip(Company,product,address,phone):
            scraped_info = {
                'Company':item[0],
                'Product': item[1],
                'Address':item[2],
                'phone':item[3]

            }
            yield scraped_info