Python beautifulsoup打印不会打印整个html页面
我正在抓取一个有页码的网站。我正在测试循环,并在其中打印来自beautifulsoup的输出。打印结果时,我注意到结果不是一个完整的html文本。它只包含html的第一部分。这是我的密码Python beautifulsoup打印不会打印整个html页面,python,html,python-3.x,web-scraping,beautifulsoup,Python,Html,Python 3.x,Web Scraping,Beautifulsoup,我正在抓取一个有页码的网站。我正在测试循环,并在其中打印来自beautifulsoup的输出。打印结果时,我注意到结果不是一个完整的html文本。它只包含html的第一部分。这是我的密码 from bs4 import BeautifulSoup import requests import time total_pages = 2295 for i in range(1,total_pages,1): pageNumber = str(i) url = requests.g
from bs4 import BeautifulSoup
import requests
import time
total_pages = 2295
for i in range(1,total_pages,1):
pageNumber = str(i)
url = requests.get("https://www.propertyguru.com.sg/property-for-sale/"+pageNumber+"?order=desc&property_type=N&property_type_code%5B0%5D=CONDO&property_type_code%5B1%5D=APT&property_type_code%5B2%5D=WALK&property_type_code%5B3%5D=CLUS&property_type_code%5B4%5D=EXCON&sort=date").text
soup = BeautifulSoup(url,'html.parser')
print(soup.prettify())
当我印汤的时候,结果是这样的
<!DOCTYPE doctype html>
<!--[if gt IE 9]><!-->
<html class="no-js is-new-brand" lang="en">
<!--<![endif]-->
<head>
<title>
</title>
<meta charset="utf-8"/>
<meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/>
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<meta content="app-id=482524585" name="apple-itunes-app">
<meta content="app-id=com.allproperty.android.consumer.sg" name="google-play-app">
<meta content="9iVXbwdOPHOH_byBFBScAHm5x-kvcPzBS_fJBFPBwbo" name="google-site-verification">
<meta content="46acd457be6effa0" name="y_key"/>
<meta content="893837EF69C47405FBAFAB120889A598" name="msvalidate.01"/>
<link href="/images/is-new-brand-favicon.ico" rel="SHORTCUT ICON"/>
<link href="/search.xml" rel="search" title="PropertyGuru Search" type="application/opensearchdescription+xml"/>
<link href="https://cdn.pgimgs.com/1574318624/sf2-search/bundles/guruweblayout/img/is-new-brand-touch-logo.png" rel="apple-touch-icon"/>
<link href="https://cdn.pgimgs.com/1574318624/sf2-search/bundles/guruweblayout/img/is-new-brand-touch-logo.png" rel="android-touch-icon"/>
<script>
// check for browsers without complete flex support ( < IE 10)
window.onload = function(e){
if(Function('/*@cc_on return document.documentMode<=10@*/')()) {
window.location = '/ie-notsupported';
}
};
</script>
<link href="//cdn1.pgimgs.com/1574318624/sg-static/cssprod/propertyguru/layout.css" rel="stylesheet" type="text/css"/>
<link href="//cdn1.pgimgs.com/1574318624/sg-static/cssprod/propertyguru/sg.css" rel="stylesheet" type="text/css"/>
<link href="//cdn1.pgimgs.com/1574318624/sg-static/cssprod/propertyguru/new_styles.css" rel="stylesheet" type="text/css"/>
<script src="//cdn1.pgimgs.com/1574318624/sg-static/jsprod/lib/modernizr-custom.min.js" type="text/javascript">
</script>
<script src="//cdn1.pgimgs.com/1574318624/sg-static/jsprod/jquery-1.12.3.min.js" type="text/javascript">
</script>
<script type="text/javascript">
var guruApp = {"environment":null,"widgetSearch":null,"widgetPoll":null,"widgetGoogleAnalytics":{"dimensions":{"dimension3":"Production","dimension4":"en","dimension13":"SG","dimension14":"web"},"googleAnalyticsObject":null,"config":{"trackingId":"UA-2417512-2","cookieDomain":"propertyguru.com.sg","siteSpeedSampleRate":10}},"userSession":{"user":{"id":null,"username":null,"roles":null,"shortlist":0,"beta":false}},"isResponsive":"false","identityEndpoint":"https:\/\/identity.propertyguru.com\/identity","defaultCurrency":"SGD","googleMaps":{"key":"AIzaSyBlCo7kpcBszvIZoH709avg1rmUjjiop0k"},"googleApis":{"key":"367223124563-is5hdjeal1rr7og4i8ii7t8imihr1dg1.apps.googleusercontent.com"}};
</script>
<link href="https://fonts.googleapis.com/css?family=Roboto:400,500" rel="stylesheet" type="text/css"/>
<link href="https://fonts.googleapis.com/css?family=Nunito:600" rel="stylesheet" type="text/css"/>
<!--[if gt IE 8]><!-->
<link href="https://cdn.pgimgs.com/1574318624/sf2-search/css/legacy_css.css" rel="stylesheet" type="text/css">
<link href="//cdn1.pgimgs.com/1574318624/sg-static/cssprod/rich/fixes.css" rel="stylesheet" type="text/css">
<!--<![endif]-->
<script type="text/javascript">
<!--
var GMAP_KEY = "AIzaSyCUbmYAT3lyhBvao9Yg-WsKtRbMxO-VvVQ";
var REGION = "SG";
var images = [];
var freetextUrl = '//api.propertyguru.com/v1/autocomplete?limit=10&locale=en&format=csv_legacy®ion=sg&objectType=HDB_ESTATE,DISTRICT,PROPERTY,STREET,MRT_STATION,SCHOOL';
//-->
</script>
<!-- GOOGLE AD MANAGER -->
<div class="clearboth">
</div>
<!-- Begin comScore Tag -->
<script>
var _comscore = _comscore || [];
_comscore.push({ c1: "2", c2: "13151479" });
(function() {
var s = document.createElement("script"), el = document.getElementsByTagName("script")[0]; s.async = true;
s.src = (document.location.protocol == "https:" ? "https://sb" : "http://b") + ".scorecardresearch.com/beacon.js";
el.parentNode.insertBefore(s, el);
})();
</script>
<noscript>
<img src="https://sb.scorecardresearch.com/p?c1=2&c2=13151479&cv=2.0&cj=1"/>
</noscript>
<!-- End comScore Tag -->
<!-- GOOGLE ANALYTICS CODE -->
<script src="https://cdn.pgimgs.com/1574318624/sf2-search/bundles/guruweblayout/js/desktop/logger.js" type="text/javascript">
</script>
<script src="https://cdn.pgimgs.com/1574318624/sf2-search/bundles/guruweblayout/js/fingerprint2.min.js" type="text/javascript">
</script>
<script src="https://cdn.pgimgs.com/1574318624/sf2-search/bundles/guruwidget/js/desktop/jquery.widgetGoogleAnalytics.js" type="text/javascript">
</script>
<!-- Google Analytics -->
<script type="text/javascript">
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
</script>
<script type="text/javascript">
if (typeof guruApp != 'undefined' && typeof guruApp.widgetGoogleAnalytics != 'undefined' && guruApp.widgetGoogleAnalytics.googleAnalyticsObject != null) {
guruApp.widgetGoogleAnalytics.googleAnalyticsObject.init();
}
</script>
<script src="https://cdn.pgimgs.com/1574318624/sf2-search/bundles/guruweblayout/js/desktop/jquery.eventDispatcher.js" type="text/javascript">
</script>
<script type="text/javascript">
$(document).ready(function () {
var $body = $('body'),
track = function(category, action, label, value, noninteraction, dimensions) {
label = cleanText(label);
guruApp.widgetGoogleAnalytics.googleAnalyticsObject.trackEvent(category, action, label, value, noninteraction, dimensions);
},
cleanText = function(str) {
return str.replace(/^https?:\/\/[^\/]+/, '').replace(/^\s+/, '').replace(/\s+$/, '').replace(/\s+/, ' ');
};
$body.find('.dropdown .dropdown-menu li.mainnav-areainsider').click(function () {
$body.trigger('ga.mainnav.areainsider.click');
});
});
</script>
<!-- ELOQUA TRACKING CODE -->
<script type="text/javascript">
var _elqQ = _elqQ || [];
_elqQ.push(['elqSetSiteId', '659351510']);
_elqQ.push(['elqTrackPageView']);
(function () {
function async_load() {
var s = document.createElement('script'); s.type = 'text/javascript'; s.async = true;
s.src = '//img03.en25.com/i/elqCfg.min.js';
var x = document.getElementsByTagName('script')[0]; x.parentNode.insertBefore(s, x);
}
if (window.addEventListener) window.addEventListener('DOMContentLoaded', async_load, false);
else if (window.attachEvent) window.attachEvent('onload', async_load);
})();
</script>
<script defer="" src="/pg186791.js" type="text/javascript">
</script>
<style type="text/css">
#d__fFH{position:absolute;top:-5000px;left:-5000px}#d__fF{font-family:serif;font-size:200px;visibility:hidden}#weeawqsxdstyxxvz{display:none!important}
</style>
</link>
</link>
</meta>
</meta>
</meta>
</head>
<body class="web_filter_recaptcha SG-web_filter_recaptcha layout-web lang-en app-sg legacy is-new-brand" id="web_filter_recaptcha">
<div id="wrapper-outer">
<div id="wrapper">
<div id="wrapper-inner">
<div class="alert alert-warning" id="gdpr-alert" role="alert" style="margin-bottom: 0; display:none;">
To comply with GDPR we will not store any personally identifiable information from you. Therefore we will serve sub-optimal experience where some features such as Login/Signup are disabled. However, you will be able to search and see all the properties, see agent contact details and contact them offline on your own.
</div>
<header class="navbar navbar-default" id="navbar-main">
<div class="header-bg">
<div class="container">
<nav class="header-nav clearfix" role="navigation">
<div class="navbar-header">
<button class="navbar-toggle" type="button">
<span class="sr-only">
Toggle navigation
</span>
<i class="pgicon pg
<!DOCTYPE doctype html>
<!--[if gt IE 9]><!-->
<html class="no-js is-new-brand" lang="en">
<!--<![endif]-->
<head>.....AND SO ON AND SO FOURTH
它只打印一些内容,而不是整个html内容 您使用的是请求库,因此它不会加载Java脚本。该网站使用API填充使用javascript的数据
你应该尝试使用硒。Selenium将用javascript加载整个页面。然后阅读页面源代码并使用beautifulsoup。Beautiful soup library仅提取网页的视图源代码 例: 美丽的汤库运转良好
from bs4 import BeautifulSoup
import requests
import time
total_pages = 2295
for i in range(1,total_pages,1):
pageNumber = str(i)
url = requests.get("https://www.propertyguru.com.sg/property-for-sale/"+pageNumber+"?order=desc&property_type=N&property_type_code%5B0%5D=CONDO&property_type_code%5B1%5D=APT&property_type_code%5B2%5D=WALK&property_type_code%5B3%5D=CLUS&property_type_code%5B4%5D=EXCON&sort=date").text
soup = BeautifulSoup(url,'html.parser')
print(soup.prettify())
不,不是。它不会以html格式打印整个页面内容