Javascript HtmlUnit:在AJAX页面上加载元素
我不熟悉Java和HtmlUnit,正在尝试从通过AJAX调用加载这些更新的页面中获取新闻更新。无论我在做什么,更新都没有加载。我错过了什么 我尝试了几种等待JS脚本完成的方法,但都没有成功。单击按钮加载更多新闻或触发他们的事件似乎也没有帮助 我一直在这样的假设下工作,即在JS脚本完成后,我不需要重新分配我的Javascript HtmlUnit:在AJAX页面上加载元素,javascript,java,ajax,web-scraping,htmlunit,Javascript,Java,Ajax,Web Scraping,Htmlunit,我不熟悉Java和HtmlUnit,正在尝试从通过AJAX调用加载这些更新的页面中获取新闻更新。无论我在做什么,更新都没有加载。我错过了什么 我尝试了几种等待JS脚本完成的方法,但都没有成功。单击按钮加载更多新闻或触发他们的事件似乎也没有帮助 我一直在这样的假设下工作,即在JS脚本完成后,我不需要重新分配我的页面实例。是这样吗 我还读到HtmlUnit的JS引擎在一些网站上运行得不太好。是这样还是我只是遗漏了什么 谢谢你的帮助 这是我的密码: import com.gargoylesoftwar
页面
实例。是这样吗
我还读到HtmlUnit的JS引擎在一些网站上运行得不太好。是这样还是我只是遗漏了什么
谢谢你的帮助
这是我的密码:
import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.NicelyResynchronizingAjaxController;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlButton;
import com.gargoylesoftware.htmlunit.html.HtmlElement;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlInput;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import java.io.IOException;
import java.util.List;
import org.junit.Assert;
public class ProblemDemo {
public static void main(String[] args) throws IOException, InterruptedException {
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_38);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.getOptions().setTimeout(10000);
webClient.setJavaScriptTimeout(10000);
webClient.getOptions().setJavaScriptEnabled(true);
// Login procedure
HtmlPage page = webClient.getPage("https://login.xing.com/login");
final HtmlForm form = (HtmlForm) page.getElementById("login-form");
final HtmlInput userID = form.getInputByName("login_form[username]");
final HtmlInput password = form.getInputByName("login_form[password]");
final HtmlButton submit = form.getButtonByName("button");
final HtmlInput remember = form.getInputByName("login_form[perm]");
userID.setValueAttribute("user");
password.setValueAttribute("pass");
remember.setChecked(true);
page = submit.click();
Assert.assertEquals("Start | XING", page.getTitleText());
//Navigate to page to be scraped
page = webClient.getPage(
"https://www.xing.com/companies/deutschepostag/updates");
webClient.waitForBackgroundJavaScript(10*1000);
System.out.println(page.getUrl().toString());
System.out.println(page.asXml());
//Print number of employees (works, not dynamic)
HtmlElement result = page.getFirstByXPath("//div[@id='profile-nav-tabs']"
+ "/ul/li[@id='employees-tab']/a");
System.out.println("Employees: " + result.getTextContent());
//Print news (doesn't work)
String news;
List<HtmlElement> results = (List<HtmlElement>) page.getByXPath("//div"
+ "[@id='company-updates']/ul[@id='news-feed']/li/div"
+ "[@class='activity-content']");
System.out.println("News found: " + results.size());
for(HtmlElement item : results){
news = "";
System.out.println(" NEW ITEM");
System.out.println(item.getTextContent());
}
}
}
将
setThroweExceptionOnScriptError
设置为false
可防止您看到错误
编辑:Latest包含对performance.navigation.redirectCount的修复程序
请尝试并还原您好,当答案被修改时,您是否有机会测试最新快照?
WARNING: Obsolete content type encountered: 'text/javascript'.