Python beautifulsoup打印不会打印整个html页面

Python beautifulsoup打印不会打印整个html页面,python,html,python-3.x,web-scraping,beautifulsoup,Python,Html,Python 3.x,Web Scraping,Beautifulsoup,我正在抓取一个有页码的网站。我正在测试循环,并在其中打印来自beautifulsoup的输出。打印结果时,我注意到结果不是一个完整的html文本。它只包含html的第一部分。这是我的密码 from bs4 import BeautifulSoup import requests import time total_pages = 2295 for i in range(1,total_pages,1): pageNumber = str(i) url = requests.g

我正在抓取一个有页码的网站。我正在测试循环,并在其中打印来自beautifulsoup的输出。打印结果时,我注意到结果不是一个完整的html文本。它只包含html的第一部分。这是我的密码

from bs4 import BeautifulSoup
import requests
import time
total_pages = 2295


for i in range(1,total_pages,1):
    pageNumber = str(i)
    url = requests.get("https://www.propertyguru.com.sg/property-for-sale/"+pageNumber+"?order=desc&property_type=N&property_type_code%5B0%5D=CONDO&property_type_code%5B1%5D=APT&property_type_code%5B2%5D=WALK&property_type_code%5B3%5D=CLUS&property_type_code%5B4%5D=EXCON&sort=date").text
    soup = BeautifulSoup(url,'html.parser')

  print(soup.prettify())
当我印汤的时候,结果是这样的

<!DOCTYPE doctype html>
<!--[if gt IE 9]><!-->
<html class="no-js is-new-brand" lang="en">
 <!--<![endif]-->
 <head>
  <title>
  </title>
  <meta charset="utf-8"/>
  <meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/>
  <meta content="width=device-width, initial-scale=1" name="viewport"/>
  <meta content="app-id=482524585" name="apple-itunes-app">
   <meta content="app-id=com.allproperty.android.consumer.sg" name="google-play-app">
    <meta content="9iVXbwdOPHOH_byBFBScAHm5x-kvcPzBS_fJBFPBwbo" name="google-site-verification">
     <meta content="46acd457be6effa0" name="y_key"/>
     <meta content="893837EF69C47405FBAFAB120889A598" name="msvalidate.01"/>
     <link href="/images/is-new-brand-favicon.ico" rel="SHORTCUT ICON"/>
     <link href="/search.xml" rel="search" title="PropertyGuru Search" type="application/opensearchdescription+xml"/>
     <link href="https://cdn.pgimgs.com/1574318624/sf2-search/bundles/guruweblayout/img/is-new-brand-touch-logo.png" rel="apple-touch-icon"/>
     <link href="https://cdn.pgimgs.com/1574318624/sf2-search/bundles/guruweblayout/img/is-new-brand-touch-logo.png" rel="android-touch-icon"/>
     <script>
      // check for browsers without complete flex support ( < IE 10)
                window.onload = function(e){
                        if(Function('/*@cc_on return document.documentMode<=10@*/')()) {
                                window.location = '/ie-notsupported';
                        }
                };
     </script>
     <link href="//cdn1.pgimgs.com/1574318624/sg-static/cssprod/propertyguru/layout.css" rel="stylesheet" type="text/css"/>
     <link href="//cdn1.pgimgs.com/1574318624/sg-static/cssprod/propertyguru/sg.css" rel="stylesheet" type="text/css"/>
     <link href="//cdn1.pgimgs.com/1574318624/sg-static/cssprod/propertyguru/new_styles.css" rel="stylesheet" type="text/css"/>
     <script src="//cdn1.pgimgs.com/1574318624/sg-static/jsprod/lib/modernizr-custom.min.js" type="text/javascript">
     </script>
     <script src="//cdn1.pgimgs.com/1574318624/sg-static/jsprod/jquery-1.12.3.min.js" type="text/javascript">
     </script>
     <script type="text/javascript">
      var guruApp = {"environment":null,"widgetSearch":null,"widgetPoll":null,"widgetGoogleAnalytics":{"dimensions":{"dimension3":"Production","dimension4":"en","dimension13":"SG","dimension14":"web"},"googleAnalyticsObject":null,"config":{"trackingId":"UA-2417512-2","cookieDomain":"propertyguru.com.sg","siteSpeedSampleRate":10}},"userSession":{"user":{"id":null,"username":null,"roles":null,"shortlist":0,"beta":false}},"isResponsive":"false","identityEndpoint":"https:\/\/identity.propertyguru.com\/identity","defaultCurrency":"SGD","googleMaps":{"key":"AIzaSyBlCo7kpcBszvIZoH709avg1rmUjjiop0k"},"googleApis":{"key":"367223124563-is5hdjeal1rr7og4i8ii7t8imihr1dg1.apps.googleusercontent.com"}};
     </script>
     <link href="https://fonts.googleapis.com/css?family=Roboto:400,500" rel="stylesheet" type="text/css"/>
     <link href="https://fonts.googleapis.com/css?family=Nunito:600" rel="stylesheet" type="text/css"/>
     <!--[if gt IE 8]><!-->
     <link href="https://cdn.pgimgs.com/1574318624/sf2-search/css/legacy_css.css" rel="stylesheet" type="text/css">
      <link href="//cdn1.pgimgs.com/1574318624/sg-static/cssprod/rich/fixes.css" rel="stylesheet" type="text/css">
       <!--<![endif]-->
       <script type="text/javascript">
        <!--
        var GMAP_KEY = "AIzaSyCUbmYAT3lyhBvao9Yg-WsKtRbMxO-VvVQ";
        var REGION = "SG";
    var images = [];
    var freetextUrl = '//api.propertyguru.com/v1/autocomplete?limit=10&locale=en&format=csv_legacy&region=sg&objectType=HDB_ESTATE,DISTRICT,PROPERTY,STREET,MRT_STATION,SCHOOL';
        //-->
       </script>
       <!-- GOOGLE AD MANAGER -->
       <div class="clearboth">
       </div>
       <!-- Begin comScore Tag -->
       <script>
        var _comscore = _comscore || [];
  _comscore.push({ c1: "2", c2: "13151479" });
  (function() {
    var s = document.createElement("script"), el = document.getElementsByTagName("script")[0]; s.async = true;
    s.src = (document.location.protocol == "https:" ? "https://sb" : "http://b") + ".scorecardresearch.com/beacon.js";
    el.parentNode.insertBefore(s, el);
  })();
       </script>
       <noscript>
        <img src="https://sb.scorecardresearch.com/p?c1=2&amp;c2=13151479&amp;cv=2.0&amp;cj=1"/>
       </noscript>
       <!-- End comScore Tag -->
       <!-- GOOGLE ANALYTICS CODE -->
       <script src="https://cdn.pgimgs.com/1574318624/sf2-search/bundles/guruweblayout/js/desktop/logger.js" type="text/javascript">
       </script>
       <script src="https://cdn.pgimgs.com/1574318624/sf2-search/bundles/guruweblayout/js/fingerprint2.min.js" type="text/javascript">
       </script>
       <script src="https://cdn.pgimgs.com/1574318624/sf2-search/bundles/guruwidget/js/desktop/jquery.widgetGoogleAnalytics.js" type="text/javascript">
       </script>
       <!-- Google Analytics -->
       <script type="text/javascript">
        (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
        (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
            m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
        })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
       </script>
       <script type="text/javascript">
        if (typeof guruApp != 'undefined' && typeof guruApp.widgetGoogleAnalytics != 'undefined' && guruApp.widgetGoogleAnalytics.googleAnalyticsObject != null) {
    guruApp.widgetGoogleAnalytics.googleAnalyticsObject.init();
}
       </script>
       <script src="https://cdn.pgimgs.com/1574318624/sf2-search/bundles/guruweblayout/js/desktop/jquery.eventDispatcher.js" type="text/javascript">
       </script>
       <script type="text/javascript">
        $(document).ready(function () {
                var $body = $('body'),
                    track = function(category, action, label, value, noninteraction, dimensions) {
                        label = cleanText(label);
                        guruApp.widgetGoogleAnalytics.googleAnalyticsObject.trackEvent(category, action, label, value, noninteraction, dimensions);
                    },
                    cleanText = function(str) {
                        return str.replace(/^https?:\/\/[^\/]+/, '').replace(/^\s+/, '').replace(/\s+$/, '').replace(/\s+/, ' ');
                    };
                $body.find('.dropdown .dropdown-menu li.mainnav-areainsider').click(function () {
                    $body.trigger('ga.mainnav.areainsider.click');
                });
            });
       </script>
       <!-- ELOQUA TRACKING CODE -->
       <script type="text/javascript">
        var _elqQ = _elqQ || [];
    _elqQ.push(['elqSetSiteId', '659351510']);
    _elqQ.push(['elqTrackPageView']);

    (function () {
        function async_load() {
            var s = document.createElement('script'); s.type = 'text/javascript'; s.async = true;
            s.src = '//img03.en25.com/i/elqCfg.min.js';
            var x = document.getElementsByTagName('script')[0]; x.parentNode.insertBefore(s, x);
        }
        if (window.addEventListener) window.addEventListener('DOMContentLoaded', async_load, false);
        else if (window.attachEvent) window.attachEvent('onload', async_load);
    })();
       </script>
       <script defer="" src="/pg186791.js" type="text/javascript">
       </script>
       <style type="text/css">
        #d__fFH{position:absolute;top:-5000px;left:-5000px}#d__fF{font-family:serif;font-size:200px;visibility:hidden}#weeawqsxdstyxxvz{display:none!important}
       </style>
      </link>
     </link>
    </meta>
   </meta>
  </meta>
 </head>
 <body class="web_filter_recaptcha SG-web_filter_recaptcha layout-web lang-en app-sg legacy is-new-brand" id="web_filter_recaptcha">
  <div id="wrapper-outer">
   <div id="wrapper">
    <div id="wrapper-inner">
     <div class="alert alert-warning" id="gdpr-alert" role="alert" style="margin-bottom: 0; display:none;">
      To comply with GDPR we will not store any personally identifiable information from you. Therefore we will serve sub-optimal experience where some features such as Login/Signup are disabled. However, you will be able to search and see all the properties, see agent contact details and contact them offline on your own.
     </div>
     <header class="navbar navbar-default" id="navbar-main">
      <div class="header-bg">
       <div class="container">
        <nav class="header-nav clearfix" role="navigation">
         <div class="navbar-header">
          <button class="navbar-toggle" type="button">
           <span class="sr-only">
            Toggle navigation
           </span>
           <i class="pgicon pg

<!DOCTYPE doctype html>
<!--[if gt IE 9]><!-->
<html class="no-js is-new-brand" lang="en">
 <!--<![endif]-->
 <head>.....AND SO ON AND SO FOURTH


它只打印一些内容,而不是整个html内容

您使用的是请求库,因此它不会加载Java脚本。该网站使用API填充使用javascript的数据


你应该尝试使用硒。Selenium将用javascript加载整个页面。然后阅读页面源代码并使用beautifulsoup。

Beautiful soup library仅提取网页的视图源代码

例:

美丽的汤库运转良好

from bs4 import BeautifulSoup
import requests
import time
total_pages = 2295


for i in range(1,total_pages,1):
    pageNumber = str(i)
    url = requests.get("https://www.propertyguru.com.sg/property-for-sale/"+pageNumber+"?order=desc&property_type=N&property_type_code%5B0%5D=CONDO&property_type_code%5B1%5D=APT&property_type_code%5B2%5D=WALK&property_type_code%5B3%5D=CLUS&property_type_code%5B4%5D=EXCON&sort=date").text
    soup = BeautifulSoup(url,'html.parser')

  print(soup.prettify())

不,不是。它不会以html格式打印整个页面内容