htmlUnit在较长的Javascript带宽检查后,文本输入不会更新

htmlUnit在较长的Javascript带宽检查后,文本输入不会更新,javascript,java,htmlunit,Javascript,Java,Htmlunit,在本页上: 我试图通过httpUnit(版本2.31)实现一个爬虫程序来检查不同提供商的带宽 如果您手动填写页面上的地址字段,您将看到一个弹出窗口,显示带宽检查的进度,然后您在同一页面上有所请求地址的可用Internet带宽。 请求的地址位于标签中(第一个文本输入字段所在的位置) 在我尝试使用htmlUnit编写爬虫程序时,虽然在输入字段没有被显示地址的字段集(id=“tko vcheck done wrapper”)中的某个标签替换之后(等待时间更长),我返回了相同的页面 这是我的密码:

在本页上: 我试图通过httpUnit(版本2.31)实现一个爬虫程序来检查不同提供商的带宽

如果您手动填写页面上的地址字段,您将看到一个弹出窗口,显示带宽检查的进度,然后您在同一页面上有所请求地址的可用Internet带宽。 请求的地址位于标签中(第一个文本输入字段所在的位置)

在我尝试使用htmlUnit编写爬虫程序时,虽然在输入字段没有被显示地址的字段集(id=“tko vcheck done wrapper”)中的某个标签替换之后(等待时间更长),我返回了相同的页面

这是我的密码:

   public Map<String, Integer> checkProviderBandWidthsByAddress(String zip, String city, String street, String hno){
    WebClient webClient = null;
    try{
        webClient = getWebCient();            
        HtmlPage page = webClient.getPage("https://www.check24.de/dsl/vergleich/");

        HtmlTextInput inputZipCity = (HtmlTextInput) page.getElementById("c24api_ac_widget_zipcity");
        HtmlHiddenInput inputZip = (HtmlHiddenInput) page.getElementById("c24api_ac_widget_zipcode");
        HtmlHiddenInput inputCity = (HtmlHiddenInput) page.getElementById("c24api_ac_widget_city");
        HtmlTextInput inputStreet = (HtmlTextInput) page.getElementById("c24api_ac_widget_street");
        HtmlTextInput inputStreetNumber = (HtmlTextInput) page.getElementById("c24api_ac_widget_streetnumber");
        HtmlButton buttonCheck = (HtmlButton) page.getElementById("tko-filter-vcheck-submit");

        inputZipCity.setValueAttribute(zip + " " + city);
        inputZipCity.fireEvent(Event.TYPE_INPUT);
        page.getWebClient().waitForBackgroundJavaScriptStartingBefore(1000);
        inputZip.setValueAttribute(zip);
        inputCity.setValueAttribute(city);
        inputStreet.setValueAttribute(street);
        inputStreetNumber.setValueAttribute(hno);

        page = buttonCheck.click();
        page.getWebClient().waitForBackgroundJavaScriptStartingBefore(30000);
        DomElement done = page.getElementById("tko-vcheck-done-wrapper"); // <-- Probleme here: NULL  

        List<DomElement> providers = page.getByXPath("//div[contains(@class, 'tko-result-row tko-clearfix')]");

        Map<String, Integer> bandWidths = findMaxSpeed(providers); // works fine to read the download BandWith for general tarif - but this dont contains the address-specific bandwith
        return bandWidths;
    }catch(Exception e){
            e.printStackTrace();
            return Collections.emptyMap();
    }finally {
        webClient.close();
    }
}

public static WebClient getWebCient(){
    WebClient webClient = new WebClient(BrowserVersion.FIREFOX_52); // also tried with Other
    webClient.setRefreshHandler(new WaitingRefreshHandler());
    webClient.getOptions().setJavaScriptEnabled(true);
    webClient.getOptions().setCssEnabled(false);
    webClient.setCssErrorHandler(new SilentCssErrorHandler());
    webClient.setAjaxController(new NicelyResynchronizingAjaxController());
    webClient.getOptions().setUseInsecureSSL(true);
    webClient.getOptions().setRedirectEnabled(true);
    webClient.getCookieManager().setCookiesEnabled(true);
    webClient.getOptions().setPopupBlockerEnabled(false);
    return webClient;
}
public Map checkProviderBandWidthsByAddress(字符串zip、字符串城市、字符串街道、字符串hno){
WebClient WebClient=null;
试一试{
webClient=GetWebClient();
HtmlPage=webClient.getPage(“https://www.check24.de/dsl/vergleich/");
HtmlTextInput inputZipCity=(HtmlTextInput)page.getElementById(“c24api_ac_widget_zipcity”);
HtmlHiddenInput inputZip=(HtmlHiddenInput)page.getElementById(“c24api_ac_widget_zipcode”);
HtmlHiddenInput inputCity=(HtmlHiddenInput)page.getElementById(“c24api_ac_widget_city”);
HtmlTextInput inputStreet=(HtmlTextInput)page.getElementById(“c24api\u ac\u widget\u street”);
HtmlTextInput inputStreetNumber=(HtmlTextInput)page.getElementById(“c24api_ac_widget_streetnumber”);
HtmlButton buttonCheck=(HtmlButton)page.getElementById(“tko过滤器vcheck提交”);
inputZipCity.setValueAttribute(zip+“”+城市);
inputZipCity.fireEvent(事件类型\输入);
page.getWebClient().waitForBackgroundJavaScriptStartingBefore(1000);
inputZip.setValueAttribute(zip);
inputCity.setValueAttribute(城市);
inputStreet.setValueAttribute(街道);
inputStreetNumber.setValueAttribute(hno);
页面=臀部。单击();
page.getWebClient().waitForBackgroundJavaScriptStartingBefore(30000);

DOMELENT done=page.getElementById(“tko vcheck done wrapper”);//像这样可怕的页面对HtmlUnit来说是一个挑战。 但是如果你有一点耐心,那么它就会起作用。 (我正在使用HtmlUnit 2.32版)

在示例代码中添加了一些注释;希望对您有所帮助。 请把代码作为概念的证明,没有足够的时间来编写好的代码

public static void main(String[] args) throws Exception {
    String url = "https://www.check24.de/dsl/vergleich/";

    try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_60)) {
        HtmlPage page = webClient.getPage(url);

        // this page has starts a lot of javascript
        // we have to wait until this is finished to get a page
        // that can respond to our typing
        wait(webClient, 60);

        HtmlTextInput inputZipCity = (HtmlTextInput) page.getElementById("c24api_ac_widget_zipcity");
        inputZipCity.type("50126");
        wait(webClient, 30);

        // System.out.println(page.getElementById("tko-result-filter-form-acsuggest").asXml());

        HtmlTextInput inputStreet = (HtmlTextInput) page.getElementById("c24api_ac_widget_street");
        HtmlTextInput inputStreetNumber = (HtmlTextInput) page.getElementById("c24api_ac_widget_streetnumber");

        inputStreet.type("Hauptstr.");
        wait(webClient, 10);

        inputStreetNumber.type("10");
        wait(webClient, 10);

        HtmlButton buttonCheck = (HtmlButton) page.getElementById("tko-filter-vcheck-submit");
        buttonCheck.click();
        wait(webClient, 4 * 60);

        HtmlPage refreshedPage = ((HtmlPage) page.getEnclosingWindow().getEnclosedPage());
        // System.out.println("----------------");
        // System.out.println(refreshedPage.asText());
        System.out.println(refreshedPage.getElementById("tko-result-sorting-text").getTextContent());
    }
}

private static void wait(WebClient webClient, int seconds) {
    long timeLimit = System.currentTimeMillis() + seconds * 1000;
    int scriptCount = webClient.waitForBackgroundJavaScript(1000);
    while (scriptCount > 1 && timeLimit > System.currentTimeMillis()) {
        scriptCount = webClient.waitForBackgroundJavaScript(1000);
    }

    // seems like there is always one job in the queue (maybe some kind of heartbeat)
    if (scriptCount > 1) {
        System.out.println("Still some js is running " + scriptCount);
    }
}
至少这会产生类似的结果

68塔里夫·韦尔弗格巴尔12,91欧元之二107,47欧元(杜希希尼特莫纳特亲王)

使用真实浏览器运行时,网站上会显示相同的文本