Python ScrapyJS-如何正确等待页面加载？_Python_Scrapy_Scrapyjs

Python ScrapyJS-如何正确等待页面加载？

python scrapy

Python ScrapyJS-如何正确等待页面加载？,python,scrapy,scrapyjs,Python,Scrapy,Scrapyjs,我使用ScrapyJS和Splash来模拟表单提交按钮的点击 def start_requests(self): script = """ function main(splash) assert(splash:autoload("https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js")) assert(splash:go(splash.ar

我使用ScrapyJS和Splash来模拟表单提交按钮的点击

def start_requests(self):
        script = """
        function main(splash)
            assert(splash:autoload("https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js"))
            assert(splash:go(splash.args.url))

            local js = [[
                var $j = jQuery.noConflict();
                $j('#USER').val('frankcastle');
                $j('#password').val('punisher');
                $j('.button-oblong-orange.button-orange a').click();
            ]]

            assert(splash:runjs(js))

            local resumeJs = [[
                function main(splash) {
                    var $j = jQuery.noConflict();
                    $j(document).ready(function(){
                        splash.resume();
                    })
                }
            ]]

        assert(splash:wait_for_resume(resumeJs))

            return {
                html = splash:html()
            }
        end
        """
        splash_meta = {'splash': {'endpoint': 'execute', 'args': {'wait': 0.5, 'lua_source': script}}}

        for url in self.start_urls:
            yield scrapy.Request(url, self.after_login, meta=splash_meta)

def after_login(self, response):
        print response.body
        return

在做了

splash:runjs（js）

之后，我求助于

splash:wait（5）

trusted

splash:wait\u resume

来获得结果。这可能并不总是有效（网络延迟），所以有更好的方法吗？

结果表明，唯一的方法是使用

splash:wait（）

，但在循环中执行，并检查某些元素（如页脚）的可用性

所以我还没有玩过这个（直到今天我才玩过Lua和Splash的一些成功尝试）

如果您这样做：

recheck = True

html = splash:html()
splash:wait(0.5)
while recheck = True:
    splash:wait(0.5)
    html2 = splash:html()
    if html != html2:
       pass
    elif:
       recheck = False
       return {
          html = splash:html(),
         }

将对无限滚动页面使用类似的方法，这些页面填充列表项以响应滚动（或页面向下）

很抱歉不熟悉Lua/Splash语法

有一种更好的方法来检查它，但是您需要一个带有等待的循环。这个想法是在页面更新时使用

splash:on_response（response）

作为回调。请注意，响应回调将被称为异步，因此主循环必须等待所有页面修改，这就是为什么我们有一个“等待”循环（例如@Krishnaraj给出的）

下面给出了一个示例，按按钮

按钮\u id

10次，可下载其他内容

function main(splash)
    assert(splash:go(splash.args.url))

    function wait_for(splash, condition)
        while not condition() do
            splash:wait(0.2)
        end
    end

    local clicks = 0

    splash:on_response(function(res)
        clicks = clicks + 1

        if clicks < 10 then
            assert(splash:runjs("document.getElementById(\"button_id\").click();"))
        end
    end)

    assert(splash:runjs("document.getElementById(\"button_id\").click();"))

    wait_for(splash, function()
        return clicks >= 10
    end)

    return splash:html()
end

主功能（飞溅）
断言（splash:go（splash.args.url））
功能等待（飞溅、条件）
而不是条件（）做什么
飞溅：等待（0.2）
结束
结束
本地点击=0
飞溅：on_响应（功能（res）
点击次数=点击次数+1
如果单击次数小于10，则
断言（splash:runjs（“document.getElementById（\“button\u id\”）。click（）；”）
结束
(完)
断言（splash:runjs（“document.getElementById（\“button\u id\”）。click（）；”）
等待（飞溅，函数（）
返回点击次数>=10次
(完)
返回splash:html（）
结束

splash允许您等待元素被删除visible@PadraicCunningham请检查我的编辑，试用splash:wait_for_resume（不确定我是否正确操作），但没有发现类似问题-

function main(splash)
    assert(splash:go(splash.args.url))

    function wait_for(splash, condition)
        while not condition() do
            splash:wait(0.2)
        end
    end

    local clicks = 0

    splash:on_response(function(res)
        clicks = clicks + 1

        if clicks < 10 then
            assert(splash:runjs("document.getElementById(\"button_id\").click();"))
        end
    end)

    assert(splash:runjs("document.getElementById(\"button_id\").click();"))

    wait_for(splash, function()
        return clicks >= 10
    end)

    return splash:html()
end