Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/361.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Scrapy Splash无法获取反应站点的数据_Python_Reactjs_Scrapy_Scrapy Splash - Fatal编程技术网

Python Scrapy Splash无法获取反应站点的数据

Python Scrapy Splash无法获取反应站点的数据,python,reactjs,scrapy,scrapy-splash,Python,Reactjs,Scrapy,Scrapy Splash,我需要一个废弃的网站。 是在反应,所以它看起来。然后我试着用scrapy splash提取数据。例如,我需要classshelf product name的“a”元素。但响应是一个空数组。我在大约5秒钟内使用了wait参数。 但我只得到一个空数组 def start_requests(self): yield SplashRequest( url='https://www.jumbo.cl/lacteos-y-bebidas-vegetales/leche

我需要一个废弃的网站。 是在反应,所以它看起来。然后我试着用scrapy splash提取数据。例如,我需要class
shelf product name
的“a”元素。但响应是一个空数组。我在大约5秒钟内使用了
wait
参数。 但我只得到一个空数组

def start_requests(self):
        yield SplashRequest(
            url='https://www.jumbo.cl/lacteos-y-bebidas-vegetales/leches-blancas?page=6',
            callback=self.parse,
            args={'wait':5}
        )

def parse(self,response):
        print(response.css("a.shelf-product-name"))

实际上,不需要使用Scrapy Splash,因为原始html响应的
标记中存储的所有必需数据都是json格式的数据:

import scrapy
from scrapy.crawler import CrawlerProcess
import json

class JumboCLSpider(scrapy.Spider):
    name = "JumboCl"
    start_urls = ["https://www.jumbo.cl/lacteos-y-bebidas-vegetales/leches-blancas?page=6"]

    def parse(self,response):
        script = [script for script in response.css("script::text") if "window.__renderData" in script.extract()]
        if script:
            script = script[0]
        data = script.extract().split("window.__renderData = ")[-1]
        json_data = json.loads(data[:-1])
        for plp in json_data["plp"]["plp_products"]:
            for product in plp["data"]:
                #yield {"productName":product["productName"]} # data from css:  a.shelf-product-name
                yield product

if __name__ == "__main__":
    c = CrawlerProcess({'USER_AGENT':'Mozilla/5.0'})
    c.crawl(JumboCLSpider)
    c.start()

实际上,不需要使用Scrapy Splash,因为原始html响应的
标记中存储的所有必需数据都是json格式的数据:

import scrapy
from scrapy.crawler import CrawlerProcess
import json

class JumboCLSpider(scrapy.Spider):
    name = "JumboCl"
    start_urls = ["https://www.jumbo.cl/lacteos-y-bebidas-vegetales/leches-blancas?page=6"]

    def parse(self,response):
        script = [script for script in response.css("script::text") if "window.__renderData" in script.extract()]
        if script:
            script = script[0]
        data = script.extract().split("window.__renderData = ")[-1]
        json_data = json.loads(data[:-1])
        for plp in json_data["plp"]["plp_products"]:
            for product in plp["data"]:
                #yield {"productName":product["productName"]} # data from css:  a.shelf-product-name
                yield product

if __name__ == "__main__":
    c = CrawlerProcess({'USER_AGENT':'Mozilla/5.0'})
    c.crawl(JumboCLSpider)
    c.start()