无法从java代码中读取URL
我已经迫不及待地想得到这篇文章的内容了 尝试从web浏览器访问此页面时不需要身份验证,但当我尝试从web应用程序获取内容时,会得到sso文件作为响应。我使用的代码如下:无法从java代码中读取URL,java,http,url,single-sign-on,Java,Http,Url,Single Sign On,我已经迫不及待地想得到这篇文章的内容了 尝试从web浏览器访问此页面时不需要身份验证,但当我尝试从web应用程序获取内容时,会得到sso文件作为响应。我使用的代码如下: HttpClient httpClient = new DefaultHttpClient(); HttpGet httpGet = new HttpGet("http://search.lib.monash.edu/primo_library/libweb/action/search.do?dscnt=1&frbg=&
HttpClient httpClient = new DefaultHttpClient();
HttpGet httpGet = new HttpGet("http://search.lib.monash.edu/primo_library/libweb/action/search.do?dscnt=1&frbg=&tab=default_tab&srt=rank&ct=search&mode=Basic&dum=true&tb=&indx=1&vl%28freeText0%29=java&fn=search&vid=MON");
HttpResponse httpResponse = httpClient.execute(httpGet);
HttpEntity responseEntity = httpResponse.getEntity();
BufferedReader in = new BufferedReader(
new InputStreamReader(responseEntity.getContent()));
String inputLine;
StringBuffer response = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
System.out.println(response.toString());
<!-- filename: sso --> <html> <head> <title>Login </title> <!-- START filename: meta-tags.pds --> <META HTTP-EQUIV="Cache-Control" CONTENT="no-cache"> <META HTTP-EQUIV="Pragma" CONTENT="no-cache"> <META HTTP-EQUIV="Expires" CONTENT="Sun, 06 Nov 1994 08:49:37 GMT"> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8"> <!-- END filename: meta-tags.pds --> <link rel="stylesheet" href="http://monash-dc05.hosted.exlibrisgroup.com:8991/PDSMExlibris.css" TYPE="text/css"> </head> <body onload = "location = '/goto/http://search.lib.monash.edu:80/primo_library/libweb/action/login.do?afterPDS=true&vid=MON&vid=MON&dscnt=2&targetURL=http%3A%2F%2Fsearch.lib.monash.edu%2Fprimo_library%2Flibweb%2Faction%2Fsearch.do%3Fdscnt%3D0&frbg=&tab=default%5Ftab&dstmp=1394940513823&srt=rank&ct=search&mode=Basic&dum=true&indx=1&tb=&vl%28freeText0%29=java&fn=search&pds_handle=GUEST';"> <noscript> <div id="header"> <div> <img src="http://monash-dc05.hosted.exlibrisgroup.com:8991//exlibris/primo/p4_1/pds/html_form/icon/exlibrislogo.jpg" alt="Exlibris Logo"><p> </p> </div> </div> <div id="connect"> <a href="/goto/http://search.lib.monash.edu:80/primo_library/libweb/action/login.do?afterPDS=true&vid=MON&vid=MON&dscnt=2&targetURL=http%3A%2F%2Fsearch.lib.monash.edu%2Fprimo_library%2Flibweb%2Faction%2Fsearch.do%3Fdscnt%3D0&frbg=&tab=default%5Ftab&dstmp=1394940513823&srt=rank&ct=search&mode=Basic&dum=true&indx=1&tb=&vl%28freeText0%29=java&fn=search&pds_handle=GUEST">Return from Check SSO </a></noscript> </div> </body> </html></body></html>
作为响应,我得到的sso文件如下所示:
HttpClient httpClient = new DefaultHttpClient();
HttpGet httpGet = new HttpGet("http://search.lib.monash.edu/primo_library/libweb/action/search.do?dscnt=1&frbg=&tab=default_tab&srt=rank&ct=search&mode=Basic&dum=true&tb=&indx=1&vl%28freeText0%29=java&fn=search&vid=MON");
HttpResponse httpResponse = httpClient.execute(httpGet);
HttpEntity responseEntity = httpResponse.getEntity();
BufferedReader in = new BufferedReader(
new InputStreamReader(responseEntity.getContent()));
String inputLine;
StringBuffer response = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
System.out.println(response.toString());
<!-- filename: sso --> <html> <head> <title>Login </title> <!-- START filename: meta-tags.pds --> <META HTTP-EQUIV="Cache-Control" CONTENT="no-cache"> <META HTTP-EQUIV="Pragma" CONTENT="no-cache"> <META HTTP-EQUIV="Expires" CONTENT="Sun, 06 Nov 1994 08:49:37 GMT"> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8"> <!-- END filename: meta-tags.pds --> <link rel="stylesheet" href="http://monash-dc05.hosted.exlibrisgroup.com:8991/PDSMExlibris.css" TYPE="text/css"> </head> <body onload = "location = '/goto/http://search.lib.monash.edu:80/primo_library/libweb/action/login.do?afterPDS=true&vid=MON&vid=MON&dscnt=2&targetURL=http%3A%2F%2Fsearch.lib.monash.edu%2Fprimo_library%2Flibweb%2Faction%2Fsearch.do%3Fdscnt%3D0&frbg=&tab=default%5Ftab&dstmp=1394940513823&srt=rank&ct=search&mode=Basic&dum=true&indx=1&tb=&vl%28freeText0%29=java&fn=search&pds_handle=GUEST';"> <noscript> <div id="header"> <div> <img src="http://monash-dc05.hosted.exlibrisgroup.com:8991//exlibris/primo/p4_1/pds/html_form/icon/exlibrislogo.jpg" alt="Exlibris Logo"><p> </p> </div> </div> <div id="connect"> <a href="/goto/http://search.lib.monash.edu:80/primo_library/libweb/action/login.do?afterPDS=true&vid=MON&vid=MON&dscnt=2&targetURL=http%3A%2F%2Fsearch.lib.monash.edu%2Fprimo_library%2Flibweb%2Faction%2Fsearch.do%3Fdscnt%3D0&frbg=&tab=default%5Ftab&dstmp=1394940513823&srt=rank&ct=search&mode=Basic&dum=true&indx=1&tb=&vl%28freeText0%29=java&fn=search&pds_handle=GUEST">Return from Check SSO </a></noscript> </div> </body> </html></body></html>
登录
请帮助。这不是因为任何身份验证问题
返回的页面有一个与正文相关联的onload
事件。因此,当您在浏览器客户端中打开引用的URL时
它首先接收您在response
绳子
然后它尝试渲染和显示它
但是,同时,onload
事件触发并加载URL作为
由location='/goto/..
定义
并且,在显示当前页面之前,会收到新页面
并显示在浏览器上
根据您收到的回复,请注意以下事项:
在JAVA代码中,您只是从指定的URL读取内容。
并且您不会将其传递给任何内容解析器来呈现和显示。除非如此,否则它将被视为静态文本
因此,与web浏览器相比,您在JAVA代码中看不到响应
其他建议:
当您读取一行并将其附加到缓冲区时,最好也将CRLF附加到该行
更改:
response.append(inputLine);
致:
它使响应文本多行且更具可读性。感谢您的解释,我真的不明白发生了什么!在我的应用程序中,我只想将搜索结果作为静态文本进行某种处理。如果我转到“go to”URL,那么我会转到正确的页面,但它只涉及java代码,而不涉及呈现的结果。我现在想知道如何获得搜索结果。非常感谢你的帮助!对于搜索结果,您最好依赖该站点提供的任何RSS提要服务(如果可用)。否则,您需要任何第三方工具。系统没有任何类型的API。你推荐什么工具来做这件事吗?谢谢目前我手头没有关于这些工具的任何此类信息。