Java 在HTMLunit WebClient中跨页面维护登录凭据
我的问题与上一个问题非常相似,只是我无法访问远程服务器,也不知道它如何进行身份验证 我正在尝试在可以使用webclient.getPage()请求的网页上保持登录状态。我访问的网站使用带有用户名、密码对的标准登录表单。我之前所做的是创建一个小函数来为我实现这一点:Java 在HTMLunit WebClient中跨页面维护登录凭据,java,webclient,htmlunit,Java,Webclient,Htmlunit,我的问题与上一个问题非常相似,只是我无法访问远程服务器,也不知道它如何进行身份验证 我正在尝试在可以使用webclient.getPage()请求的网页上保持登录状态。我访问的网站使用带有用户名、密码对的标准登录表单。我之前所做的是创建一个小函数来为我实现这一点: public static HtmlPage logIn(HtmlPage page) { HtmlPage nextpage = null; final HtmlForm form = page.getFormByN
public static HtmlPage logIn(HtmlPage page) {
HtmlPage nextpage = null;
final HtmlForm form = page.getFormByName("login_form");
final HtmlSubmitInput button = form.getInputByValue("Login");
final HtmlTextInput username = form.getInputByName("username");
final HtmlPasswordInput password = form.getInputByName("password");
username.setValueAttribute("user_foo");
password.setValueAttribute("pwd_bar");
// hit submit button and return the requested page
try {
nextpage = button.click();
} catch (IOException e) {
e.printStackTrace();
}
return nextpage;
}
问题是,我必须手动搜索此函数返回的页面,才能找到指向所需页面的链接。更麻烦的是,这只适用于登录后的页面,而不适用于其他页面
相反,我希望将登录信息保存在浏览器模拟器“webclient”中,这样我就可以无缝地访问站点中任何受保护的页面。除了尝试上一个问题(如上链接)中的解决方案外,我还尝试了以下解决方案,但没有成功:
private static void setCredentials(WebClient webClient) {
String username = "user_foo";
String password = "pwd_bar";
DefaultCredentialsProvider creds = (DefaultCredentialsProvider) webClient.getCredentialsProvider();//new DefaultCredentialsProvider();
try {
creds.addCredentials(username, password);
webClient.setCredentialsProvider(creds);
}
catch (Exception e){
System.out.println("!!! Problem login in");
e.printStackTrace();
}
编辑:以下是显示我如何使用webClient的主要功能:
公共静态void main(字符串[]args)引发异常{
// Create and initialize WebClient object
WebClient webClient = new WebClient(/*BrowserVersion.CHROME_16*/);
webClient.setThrowExceptionOnScriptError(false);
webClient.setJavaScriptEnabled(false);
webClient.setCssEnabled(false);
webClient.getCookieManager().setCookiesEnabled(true);
setCredentials(webClient);
HtmlPage subj_page = null;
//visit login page and get it
String url = "http://www.website.com/index.php";
HtmlPage page = (HtmlPage) webClient.getPage(url);
HtmlAnchor anchor = null;
page = logIn(page);
// search for content
page = searchPage(page, "recent articles");
// click on the paper link
anchor = (HtmlAnchor) page.getAnchorByText("recent articles");
page = (HtmlPage) anchor.click();
// loop through found articles
//{{{page
int curr_pg = 1;
int last_pg = 5;
page = webClient.getPage(<starting URL of the first article>); // such URLs look like: "www.website.com/view_articles.php?publication_id=17&page=1"
do {
// find sections on this page;
List <HtmlDivision> sections = new ArrayList<HtmlDivision>();
List <HtmlDivision> artdivs = new ArrayList<HtmlDivision>();
List <HtmlDivision> tagdivs = new ArrayList<HtmlDivision>();
sections = (List<HtmlDivision>) page.getByXPath("//div[@class='article_section']");
artdivs = (List<HtmlDivision>) page.getByXPath("//div[@class='article_head']");
tagdivs = (List<HtmlDivision>) page.getByXPath("//div[@class='article_tag']");
int num_ques = sections.size();
HtmlDivision section, artdiv, tagdiv;
// for every section, get its sub-articles
for (int i = 0; i < num_ques; i++) {
section = sections.get(i);
artdiv = artdivs.get(i);
tagdiv = tagdivs.get(i);
// find the sub-article details and print to xml file
String xml = getXMLArticle(artdiv, section.asText(), tagdiv);
System.out.println(xml);
System.out.println("-----------------------------");
}
//remove IllegalMonitorStateException *
synchronized (webClient) {
webClient.wait(2000); // wait for 2 seconds
}
String href = "?publication_id=17&page=" + curr_pg;
anchor = page.getAnchorByHref(href);
page = anchor.click();
System.out.println("anchor val: " + anchor.getHrefAttribute());
curr_pg++;
} while (curr_pg < last_pg);
//}}}page
webClient.closeAllWindows();
}
//创建并初始化WebClient对象
WebClient WebClient=新的WebClient(/*BrowserVersion.CHROME_16*/);
webClient.SetThroweExceptionOnScriptError(false);
webClient.setJavaScriptEnabled(false);
webClient.setCssEnabled(false);
webClient.getCookieManager().setCookiesEnabled(true);
setCredentials(网络客户端);
HtmlPage subc_page=null;
//访问登录页面并获取它
字符串url=”http://www.website.com/index.php";
HtmlPage=(HtmlPage)webClient.getPage(url);
HtmlAnchor anchor=null;
页面=登录(第页);
//搜索内容
页面=搜索页面(第页,“最近的文章”);
//点击纸张链接
anchor=(HtmlAnchor)page.getAnchorByText(“最近的文章”);
页面=(HtmlPage)锚定。单击();
//循环浏览找到的文章
//{{{{页
int curr_pg=1;
int last_pg=5;
page=webClient.getPage();//这样的URL看起来像:“www.website.com/view_articles.php?publication_id=17&page=1”
做{
//在本页上查找部分;
List sections=new ArrayList();
List artdivs=new ArrayList();
List tagdivs=new ArrayList();
sections=(List)page.getByXPath(“//div[@class='article\u section']”);
artdivs=(List)page.getByXPath(//div[@class='article\u head']);
tagdivs=(列表)page.getByXPath(“//div[@class='article\u tag']”);
int num_ques=sections.size();
塔格迪夫艺术部HtmlDivision科;
//对于每个部分,获取其子文章
对于(int i=0;i
其他信息:我没有关于远程站点服务器的身份验证机制的信息,因为我无法访问它,但是您的帮助会很好。谢谢大家! 登录后,您如何访问其他页面,以及登录时会发生什么?向我们展示代码。@JBNizet,刚刚发布了上面的主代码。getXMLArticle()的实现并不重要,因为它不需要webClient。如果您需要任何其他信息,请告诉我:)