Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/337.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java Jsoup无法从网页获取完整内容?_Java_Web Scraping_Jsoup - Fatal编程技术网

Java Jsoup无法从网页获取完整内容?

Java Jsoup无法从网页获取完整内容?,java,web-scraping,jsoup,Java,Web Scraping,Jsoup,我正在尝试使用JSOUP从以下页面获取内容: 但它不会获取整个页面内容,只返回内容,直到标记关闭。它返回的内容如下: <!DOCTYPE html> <!--[if lt IE 7]> <html class="no-js lt-ie9 lt-ie8 lt-ie7" lang="en" dir="ltr"> <![endif]--> <!--[if IE 7]> <html class="no-js lt-ie9 lt-

我正在尝试使用JSOUP从以下页面获取内容:

但它不会获取整个页面内容,只返回内容,直到标记关闭。它返回的内容如下:

<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js lt-ie9 lt-ie8 lt-ie7"  lang="en" dir="ltr"> <![endif]-->
<!--[if IE 7]>    <html class="no-js lt-ie9 lt-ie8"  lang="en" dir="ltr"> <![endif]-->
<!--[if IE 8]>    <html class="no-js lt-ie9"  lang="en" dir="ltr"> <![endif]-->
<!--[if IE 9]>    <html class="no-js ie9"  lang="en" dir="ltr"> <![endif]-->
<!--[if gt IE 9]><html class="no-js"  lang="en" dir="ltr"> <![endif]-->
<html>
<head>
    <meta charset="utf-8" />
    <script type="text/javascript">        var _prum = { id: "5227f1fbabe53ddc1f000000" }; var PRUM_EPISODES = PRUM_EPISODES || {}; PRUM_EPISODES.q = []; PRUM_EPISODES.mark = function (b, a) { PRUM_EPISODES.q.push(["mark", b, a || new Date().getTime()]) }; PRUM_EPISODES.measure = function (b, a, b) { PRUM_EPISODES.q.push(["measure", b, a, b || new Date().getTime()]) }; PRUM_EPISODES.done = function (a) { PRUM_EPISODES.q.push(["done", a]) }; PRUM_EPISODES.mark("firstbyte"); (function () { var b = document.getElementsByTagName("script")[0]; var a = document.createElement("script"); a.type = "text/javascript"; a.async = true; a.charset = "UTF-8"; a.src = "//rum-static.pingdom.net/prum.min.js"; b.parentNode.insertBefore(a, b) })();</script>
    <link href="~/images/favicon.ico" rel="CAA Shortcut Icon"></link>
    <title>Bacha Khan International Airport, Peshawar | www.peshawarairport.com</title>
    <meta name="description" content="">
    <meta name="apple-mobile-web-app-capable" content="yes" />
    <!-- <meta name="p:domain_verify" content="297cb2c48faff5539c27d75f076408b8"/> -->

    <style type="text/css">
        @import url("http://www.caapakistan.com.pk/css/jiap-website/system.base.css?nkrgyj");
        @import url("http://www.caapakistan.com.pk/css/jiap-website/system.messages.css?nkrgyj");
        @import url("http://www.caapakistan.com.pk/css/jiap-website/system.theme.css?nkrgyj");
    </style>
    <style type="text/css">
        @import url("http://www.caapakistan.com.pk/css/jiap-website/comment.css?nkrgyj");
        @import url("http://www.caapakistan.com.pk/sites/all/modules/contrib/date/date_api/date9687.css?nkrgyj");
        @import url("http://www.caapakistan.com.pk/css/jiap-website/field.css?nkrgyj");
        @import url("http://www.caapakistan.com.pk/css/jiap-website/node.css?nkrgyj");
        @import url("http://www.caapakistan.com.pk/css/jiap-website/search.css?nkrgyj");
        @import url("http://www.caapakistan.com.pk/css/jiap-website/user.css?nkrgyj");
        @import url("http://www.caapakistan.com.pk/sites/all/modules/contrib/workflow/workflow_admin_ui/workflow_admin_ui9687.css?nkrgyj");
        @import url("http://www.caapakistan.com.pk/sites/all/modules/contrib/views/css/views9687.css?nkrgyj");
    </style>
    <style type="text/css">
        @import url("http://www.caapakistan.com.pk/sites/all/modules/contrib/ctools/css/ctools9687.css?nkrgyj");
        @import url("http://www.caapakistan.com.pk/sites/all/modules/contrib/panels/css/panels9687.css?nkrgyj");
    </style>
    <style type="text/css">
        @import url("http://www.caapakistan.com.pk/sites/all/themes/sfo/css/bootstrap-n-responsive.min9687.css?nkrgyj");
        @import url("http://www.caapakistan.com.pk/sites/all/themes/sfo/css/base9687.css?nkrgyj");
        @import url("http://www.caapakistan.com.pk/sites/all/themes/sfo/css/theme_flysfo9687.css?nkrgyj");
        @import url("http://www.caapakistan.com.pk/sites/all/themes/sfo/css/flysfo_cn9687.css?nkrgyj");
        @import url("http://www.caapakistan.com.pk/sites/all/themes/sfo/css/mobilestyle9687.css?nkrgyj");
        @import url("http://www.caapakistan.com.pk/sites/all/themes/sfo/css/jplayer.sfo/jplayer.blue.monday9687.css?nkrgyj");
    </style>
    <script type="text/javascript" src="http://www.caapakistan.com.pk/sites/all/themes/sfo/js/libs/modernizr-2.5.3.min.js"></script>
    <script type="text/javascript">        var switchTo5x = false;</script>
    <script type="text/javascript">        stLight.options({ publisher: "a574d78b-ed29-4436-b50d-0213b9613fe7", doNotHash: true, doNotCopy: true, hashAddressBar: true, offsetTop:
URL=新URL(“https://www.google.com/");
URLConnection con=url.openConnection();
InputStream=con.getInputStream();
BufferedReader br=新的BufferedReader(新的InputStreamReader(is));
字符串行=”;
while(br.readLine()!=null){
line=line+br.readLine();
}
//系统输出打印项次(行);
Document doc=Jsoup.parse(line.replace(“,”);
尝试此操作以获取内容https://www.google.com/"); URLConnection con=url.openConnection(); InputStream=con.getInputStream(); BufferedReader br=新的BufferedReader(新的InputStreamReader(is)); 字符串行=”; while(br.readLine()!=null){ line=line+br.readLine(); } //系统输出打印项次(行); Document doc=Jsoup.parse(line.replace(“,”);
尝试此方法获取内容

对我来说很有效,问题一定出在您没有引用的代码中。但请重新查看您的标题:是的,JSoup可以从网页获取完整内容。您的代码工作正常。我一直读到
timeout=1000
@T.J.Crowder你是对的,问题出在其他地方,jsoup工作正常。我想我应该删除这个问题:(@user818455:无需皱眉。:-)我很高兴你发现了问题!但是,是的,这个问题可能不会在将来帮助其他人,除非你更新它来描述什么是错误的,然后发布一个答案,说明你是如何修复它的。如果你认为其他人也会犯同样的错误,那就很合适了。但删除它也可以。最好的,对我来说是有效的,问题一定是在你没有引用的代码中。但是你的标题是:是的,JSoup可以从网页中获取完整的内容。你的代码工作得很好。我一直读到
timeout=1000
@T.J.Crowder你是对的,问题出在其他地方,jsoup工作正常。我想我应该删除这个问题:(@user818455:无需皱眉。:-)我很高兴你发现了问题!但是,是的,这个问题可能不会在将来帮助其他人,除非你更新它来描述什么是错误的,然后发布一个答案,说明你是如何修复它的。如果你认为其他人也会犯同样的错误,那就很合适了。但删除它也可以。最好的,
Document doc = Jsoup.connect("http://www.peshawarairport.com.pk/Schedule.aspx?Type=Arrival").userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safari/537.36").maxBodySize(0).timeout(maxTimeout)
                    .get();
URL url = new URL("https://www.google.com/");
    URLConnection con = url.openConnection();
    InputStream is = con.getInputStream();
    BufferedReader br = new BufferedReader(new InputStreamReader(is));
    String line="";
    while (br.readLine() != null) {
        line = line + br.readLine();
    }
    //System.out.println(line);

Document doc = Jsoup.parse(line.replace("<!--","").replace("-->",""));