htmlUnit在较长的Javascript带宽检查后,文本输入不会更新
在本页上: 我试图通过httpUnit(版本2.31)实现一个爬虫程序来检查不同提供商的带宽 如果您手动填写页面上的地址字段,您将看到一个弹出窗口,显示带宽检查的进度,然后您在同一页面上有所请求地址的可用Internet带宽。 请求的地址位于标签中(第一个文本输入字段所在的位置) 在我尝试使用htmlUnit编写爬虫程序时,虽然在输入字段没有被显示地址的字段集(id=“tko vcheck done wrapper”)中的某个标签替换之后(等待时间更长),我返回了相同的页面 这是我的密码:htmlUnit在较长的Javascript带宽检查后,文本输入不会更新,javascript,java,htmlunit,Javascript,Java,Htmlunit,在本页上: 我试图通过httpUnit(版本2.31)实现一个爬虫程序来检查不同提供商的带宽 如果您手动填写页面上的地址字段,您将看到一个弹出窗口,显示带宽检查的进度,然后您在同一页面上有所请求地址的可用Internet带宽。 请求的地址位于标签中(第一个文本输入字段所在的位置) 在我尝试使用htmlUnit编写爬虫程序时,虽然在输入字段没有被显示地址的字段集(id=“tko vcheck done wrapper”)中的某个标签替换之后(等待时间更长),我返回了相同的页面 这是我的密码:
public Map<String, Integer> checkProviderBandWidthsByAddress(String zip, String city, String street, String hno){
WebClient webClient = null;
try{
webClient = getWebCient();
HtmlPage page = webClient.getPage("https://www.check24.de/dsl/vergleich/");
HtmlTextInput inputZipCity = (HtmlTextInput) page.getElementById("c24api_ac_widget_zipcity");
HtmlHiddenInput inputZip = (HtmlHiddenInput) page.getElementById("c24api_ac_widget_zipcode");
HtmlHiddenInput inputCity = (HtmlHiddenInput) page.getElementById("c24api_ac_widget_city");
HtmlTextInput inputStreet = (HtmlTextInput) page.getElementById("c24api_ac_widget_street");
HtmlTextInput inputStreetNumber = (HtmlTextInput) page.getElementById("c24api_ac_widget_streetnumber");
HtmlButton buttonCheck = (HtmlButton) page.getElementById("tko-filter-vcheck-submit");
inputZipCity.setValueAttribute(zip + " " + city);
inputZipCity.fireEvent(Event.TYPE_INPUT);
page.getWebClient().waitForBackgroundJavaScriptStartingBefore(1000);
inputZip.setValueAttribute(zip);
inputCity.setValueAttribute(city);
inputStreet.setValueAttribute(street);
inputStreetNumber.setValueAttribute(hno);
page = buttonCheck.click();
page.getWebClient().waitForBackgroundJavaScriptStartingBefore(30000);
DomElement done = page.getElementById("tko-vcheck-done-wrapper"); // <-- Probleme here: NULL
List<DomElement> providers = page.getByXPath("//div[contains(@class, 'tko-result-row tko-clearfix')]");
Map<String, Integer> bandWidths = findMaxSpeed(providers); // works fine to read the download BandWith for general tarif - but this dont contains the address-specific bandwith
return bandWidths;
}catch(Exception e){
e.printStackTrace();
return Collections.emptyMap();
}finally {
webClient.close();
}
}
public static WebClient getWebCient(){
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_52); // also tried with Other
webClient.setRefreshHandler(new WaitingRefreshHandler());
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setCssEnabled(false);
webClient.setCssErrorHandler(new SilentCssErrorHandler());
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.getOptions().setUseInsecureSSL(true);
webClient.getOptions().setRedirectEnabled(true);
webClient.getCookieManager().setCookiesEnabled(true);
webClient.getOptions().setPopupBlockerEnabled(false);
return webClient;
}
public Map checkProviderBandWidthsByAddress(字符串zip、字符串城市、字符串街道、字符串hno){
WebClient WebClient=null;
试一试{
webClient=GetWebClient();
HtmlPage=webClient.getPage(“https://www.check24.de/dsl/vergleich/");
HtmlTextInput inputZipCity=(HtmlTextInput)page.getElementById(“c24api_ac_widget_zipcity”);
HtmlHiddenInput inputZip=(HtmlHiddenInput)page.getElementById(“c24api_ac_widget_zipcode”);
HtmlHiddenInput inputCity=(HtmlHiddenInput)page.getElementById(“c24api_ac_widget_city”);
HtmlTextInput inputStreet=(HtmlTextInput)page.getElementById(“c24api\u ac\u widget\u street”);
HtmlTextInput inputStreetNumber=(HtmlTextInput)page.getElementById(“c24api_ac_widget_streetnumber”);
HtmlButton buttonCheck=(HtmlButton)page.getElementById(“tko过滤器vcheck提交”);
inputZipCity.setValueAttribute(zip+“”+城市);
inputZipCity.fireEvent(事件类型\输入);
page.getWebClient().waitForBackgroundJavaScriptStartingBefore(1000);
inputZip.setValueAttribute(zip);
inputCity.setValueAttribute(城市);
inputStreet.setValueAttribute(街道);
inputStreetNumber.setValueAttribute(hno);
页面=臀部。单击();
page.getWebClient().waitForBackgroundJavaScriptStartingBefore(30000);
DOMELENT done=page.getElementById(“tko vcheck done wrapper”);//像这样可怕的页面对HtmlUnit来说是一个挑战。
但是如果你有一点耐心,那么它就会起作用。
(我正在使用HtmlUnit 2.32版)
在示例代码中添加了一些注释;希望对您有所帮助。
请把代码作为概念的证明,没有足够的时间来编写好的代码
public static void main(String[] args) throws Exception {
String url = "https://www.check24.de/dsl/vergleich/";
try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_60)) {
HtmlPage page = webClient.getPage(url);
// this page has starts a lot of javascript
// we have to wait until this is finished to get a page
// that can respond to our typing
wait(webClient, 60);
HtmlTextInput inputZipCity = (HtmlTextInput) page.getElementById("c24api_ac_widget_zipcity");
inputZipCity.type("50126");
wait(webClient, 30);
// System.out.println(page.getElementById("tko-result-filter-form-acsuggest").asXml());
HtmlTextInput inputStreet = (HtmlTextInput) page.getElementById("c24api_ac_widget_street");
HtmlTextInput inputStreetNumber = (HtmlTextInput) page.getElementById("c24api_ac_widget_streetnumber");
inputStreet.type("Hauptstr.");
wait(webClient, 10);
inputStreetNumber.type("10");
wait(webClient, 10);
HtmlButton buttonCheck = (HtmlButton) page.getElementById("tko-filter-vcheck-submit");
buttonCheck.click();
wait(webClient, 4 * 60);
HtmlPage refreshedPage = ((HtmlPage) page.getEnclosingWindow().getEnclosedPage());
// System.out.println("----------------");
// System.out.println(refreshedPage.asText());
System.out.println(refreshedPage.getElementById("tko-result-sorting-text").getTextContent());
}
}
private static void wait(WebClient webClient, int seconds) {
long timeLimit = System.currentTimeMillis() + seconds * 1000;
int scriptCount = webClient.waitForBackgroundJavaScript(1000);
while (scriptCount > 1 && timeLimit > System.currentTimeMillis()) {
scriptCount = webClient.waitForBackgroundJavaScript(1000);
}
// seems like there is always one job in the queue (maybe some kind of heartbeat)
if (scriptCount > 1) {
System.out.println("Still some js is running " + scriptCount);
}
}
至少这会产生类似的结果
68塔里夫·韦尔弗格巴尔12,91欧元之二107,47欧元(杜希希尼特莫纳特亲王)
使用真实浏览器运行时,网站上会显示相同的文本