如何从HTMLUnit中的Javascript链接下载文件
正如标题所说,我正试图从javascript链接下载一个带有HTMLUnit的文件 我现在开始的页面是。当我在浏览器中单击“AuthenticationwithJavaWebStart(new method)”链接时,会下载一个.jnlp文件,然后运行该文件打开一个Java程序窗口,该窗口要求提供身份验证信息。一旦身份验证成功,原始浏览器窗口将加载包含我将要抓取的信息的页面 起始页中的链接源代码片段为:如何从HTMLUnit中的Javascript链接下载文件,javascript,java,htmlunit,Javascript,Java,Htmlunit,正如标题所说,我正试图从javascript链接下载一个带有HTMLUnit的文件 我现在开始的页面是。当我在浏览器中单击“AuthenticationwithJavaWebStart(new method)”链接时,会下载一个.jnlp文件,然后运行该文件打开一个Java程序窗口,该窗口要求提供身份验证信息。一旦身份验证成功,原始浏览器窗口将加载包含我将要抓取的信息的页面 起始页中的链接源代码片段为: <tr> <!-- onClick="return launchWebSt
<tr>
<!-- onClick="return launchWebStart('authenticate');" -->
<td><a href="javascript:void(0)" id="webstart-authenticate" ><font size="5">Authenticate with Java Web Start (new method)</font></a>
</tr>
从上面的代码中打印出来的是来自起始网页的html,而不是预期的jnlp文件。控制台还每隔3秒钟从javascript WebConsole打印一次状态更新(至少如果我让代码等待足够长的时间),因此我知道javascript发生了一些事情(函数launchWebStart和followMediator位于单独的javascript文件WebStart.js中):
我还尝试使用CollectionAttachmentHandler对象,如所述:
import java.io.IOException;
导入java.net.MalformedURLException;
导入java.util.List;
导入com.gargoylesoftware.htmlunit.*;
导入com.gargoylesoftware.htmlunit.attachment.attachment;
导入com.gargoylesoftware.htmlunit.attachment.CollectingAttachmentHandler;
导入com.gargoylesoftware.htmlunit.html.HtmlAnchor;
导入com.gargoylesoftware.htmlunit.html.HtmlPage;
公共类Test2{
公共静态void main(字符串[]args)引发FailingHttpStatusCodeException、MalformDurException、IOException{
WebClient WebClient=新的WebClient(BrowserVersion.FIREFOX\u 45);
//打开起始网页
HtmlPage=webClient.getPage(“https://ppair.uspto.gov/TruePassWebStart/AuthenticationChooser.html");
//链接所在元素的id
字符串linkID=“webstart验证”;
//确定合适的锚
HtmlAnchor锚点=(HtmlAnchor)page.getElementById(linkID);
CollectioningAttachmentHandler attachmentHandler=新的CollectionAttachmentHandler();
webClient.setAttachmentHandler(attachmentHandler);
attachmentHandler.handleAttachment(anchor.click());
List attachments=attachmentHandler.getCollectedAttachments();
int i=0;
而(i
此代码还打印出起始网页的内容。所以其他的解决方案似乎都不适合我。我不知道我做错了什么。我已经没有办法让它工作了(我想这很容易!)任何建议都非常感谢 这是一个基于Test2的工作版本
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_45);
// open starting webpage
HtmlPage page = webClient.getPage("https://ppair.uspto.gov/TruePassWebStart/AuthenticationChooser.html");
// id of the element where the link is
String linkID = "webstart-authenticate";
// identify the appropriate anchor
HtmlAnchor anchor = (HtmlAnchor) page.getElementById(linkID);
CountDownLatch latch = new CountDownLatch(1);
webClient.setWebStartHandler(new WebStartHandler(){
@Override
public void handleJnlpResponse(WebResponse webResponse)
{
System.out.println("downloading...");
try (FileOutputStream fos = new FileOutputStream("/Users/Franklyn/Downloads/uspto-auth.authenticate2.jnlp"))
{
IOUtils.copy(webResponse.getContentAsStream(),fos);
} catch (IOException e)
{
throw new RuntimeException(e);
}
System.out.println("downloaded");
latch.countDown();
}
});
anchor.click();
latch.await();//wait downloading to finish
webClient.close();
那么为什么您的Test2不起作用呢?因为响应的内容类型对应的下载文件是application/x-java-jnlp-file,所以您需要使用WebStartHandler。如果响应头包含一个名为“Content Disposition”的头,并且其值以“attachment”开头,那么Test2可能会正常工作
Nov 21, 2016 2:53:25 PM com.gargoylesoftware.htmlunit.WebConsole info
INFO: launchWebStart
Nov 21, 2016 2:53:25 PM com.gargoylesoftware.htmlunit.WebConsole info
INFO: followMediator
Nov 21, 2016 2:53:25 PM com.gargoylesoftware.htmlunit.WebConsole info
INFO: responseReceived:200
WAIT
Nov 21, 2016 2:53:25 PM com.gargoylesoftware.htmlunit.WebConsole info
INFO: mediatorCallback: next wait
import java.io.IOException;
import java.net.MalformedURLException;
import java.util.List;
import com.gargoylesoftware.htmlunit.*;
import com.gargoylesoftware.htmlunit.attachment.Attachment;
import com.gargoylesoftware.htmlunit.attachment.CollectingAttachmentHandler;
import com.gargoylesoftware.htmlunit.html.HtmlAnchor;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
public class Test2 {
public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_45);
// open starting webpage
HtmlPage page = webClient.getPage("https://ppair.uspto.gov/TruePassWebStart/AuthenticationChooser.html");
// id of the element where the link is
String linkID = "webstart-authenticate";
// identify the appropriate anchor
HtmlAnchor anchor = (HtmlAnchor) page.getElementById(linkID);
CollectingAttachmentHandler attachmentHandler = new CollectingAttachmentHandler();
webClient.setAttachmentHandler(attachmentHandler);
attachmentHandler.handleAttachment(anchor.click());
List<Attachment> attachments = attachmentHandler.getCollectedAttachments();
int i = 0;
while (i < attachments.size()) {
Attachment attachment = attachments.get(i);
Page attachedPage = attachment.getPage();
WebResponse attachmentResponse = attachedPage.getWebResponse();
String content = attachmentResponse.getContentAsString();
System.out.println(content);
i++;
}
webClient.close();
}
}
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_45);
// open starting webpage
HtmlPage page = webClient.getPage("https://ppair.uspto.gov/TruePassWebStart/AuthenticationChooser.html");
// id of the element where the link is
String linkID = "webstart-authenticate";
// identify the appropriate anchor
HtmlAnchor anchor = (HtmlAnchor) page.getElementById(linkID);
CountDownLatch latch = new CountDownLatch(1);
webClient.setWebStartHandler(new WebStartHandler(){
@Override
public void handleJnlpResponse(WebResponse webResponse)
{
System.out.println("downloading...");
try (FileOutputStream fos = new FileOutputStream("/Users/Franklyn/Downloads/uspto-auth.authenticate2.jnlp"))
{
IOUtils.copy(webResponse.getContentAsStream(),fos);
} catch (IOException e)
{
throw new RuntimeException(e);
}
System.out.println("downloaded");
latch.countDown();
}
});
anchor.click();
latch.await();//wait downloading to finish
webClient.close();