Python 2.7 Beautifulsoup无法读取页面_Python 2.7_Beautifulsoup_Urlopen

Python 2.7 Beautifulsoup无法读取页面

python-2.7

Python 2.7 Beautifulsoup无法读取页面,python-2.7,beautifulsoup,urlopen,Python 2.7,Beautifulsoup,Urlopen,我正在尝试以下方法： from urllib2 import urlopen from BeautifulSoup import BeautifulSoup url = 'http://propaccess.traviscad.org/clientdb/Property.aspx?prop_id=312669' soup = BeautifulSoup(urlopen(url).read()) print soup <!DOCTYPE HTML PUBLIC "-//W3C//DTD H

我正在尝试以下方法：

from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup
url = 'http://propaccess.traviscad.org/clientdb/Property.aspx?prop_id=312669'
soup = BeautifulSoup(urlopen(url).read())
print soup

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
        "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-type" content="text/html;charset=utf-8" />
<title>Travis Property Search</title>
<style type="text/css">
      body { text-align: center; padding: 150px; }
      h1 { font-size: 50px; }
      body { font: 20px Helvetica, sans-serif; color: #333; }
      #article { display: block; text-align: left; width: 650px; margin: 0 auto; }
      a { color: #dc8100; text-decoration: none; }
      a:hover { color: #333; text-decoration: none; }
    </style>
</head>
<body>
<div id="article">
<h1>Please try again</h1>
<div>
<p>Sorry for the inconvenience but your session has either timed out or the server is busy handling other requests. You may visit us on the the following website for information, otherwise please retry your search again shortly:<br /><br />
<a href="http://www.traviscad.org/">Travis Central Appraisal District Website</a> </p>
<p><b><a href="http://propaccess.traviscad.org/clientdb/?cid=1">Click here to reload the property search to try again</a></b></p>
</div>
</div>
</body>
</html>

上面的

print

语句显示以下内容：

from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup
url = 'http://propaccess.traviscad.org/clientdb/Property.aspx?prop_id=312669'
soup = BeautifulSoup(urlopen(url).read())
print soup

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
        "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-type" content="text/html;charset=utf-8" />
<title>Travis Property Search</title>
<style type="text/css">
      body { text-align: center; padding: 150px; }
      h1 { font-size: 50px; }
      body { font: 20px Helvetica, sans-serif; color: #333; }
      #article { display: block; text-align: left; width: 650px; margin: 0 auto; }
      a { color: #dc8100; text-decoration: none; }
      a:hover { color: #333; text-decoration: none; }
    </style>
</head>
<body>
<div id="article">
<h1>Please try again</h1>
<div>
<p>Sorry for the inconvenience but your session has either timed out or the server is busy handling other requests. You may visit us on the the following website for information, otherwise please retry your search again shortly:<br /><br />
<a href="http://www.traviscad.org/">Travis Central Appraisal District Website</a> </p>
<p><b><a href="http://propaccess.traviscad.org/clientdb/?cid=1">Click here to reload the property search to try again</a></b></p>
</div>
</div>
</body>
</html>


特拉维斯财产搜索
正文{文本对齐：居中；填充：150px；}
h1{字体大小：50px；}
正文{字体：20px Helvetica，无衬线；颜色：#333；}
#项目{显示：块；文本对齐：左；宽度：650px；边距：0自动；}
a{color:#dc8100；文本装饰：无；}
a:悬停{颜色：#333；文本装饰：无；}
请再试一次
很抱歉给您带来不便，但您的会话已超时，或者服务器正忙于处理其他请求。您可以访问以下网站了解信息，否则请稍后重试搜索：

我可以通过同一台计算机上的浏览器访问url，因此服务器绝对不会阻止我的IP。我不明白我的代码出了什么问题？

您需要先获取一些cookies，然后才能访问url。
虽然这可以通过

urllib2

和

CookieJar

实现，但我建议

请求

：

import requests
from BeautifulSoup import BeautifulSoup

url1 = 'http://propaccess.traviscad.org/clientdb/?cid=1'
url = 'http://propaccess.traviscad.org/clientdb/Property.aspx?prop_id=312669'
ses = requests.Session()
ses.get(url1)
soup = BeautifulSoup(ses.get(url).content)
print soup.prettify()

import urllib2
from cookielib import CookieJar

url1 = 'http://propaccess.traviscad.org/clientdb/?cid=1'
url = 'http://propaccess.traviscad.org/clientdb/Property.aspx?prop_id=312669'
cj = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
opener.open(url1)
soup = BeautifulSoup(opener.open(url).read())
print soup.prettify()

请注意，

请求

不是标准库，您必须将其全部插入。如果要使用

urllib2

：

import requests
from BeautifulSoup import BeautifulSoup

url1 = 'http://propaccess.traviscad.org/clientdb/?cid=1'
url = 'http://propaccess.traviscad.org/clientdb/Property.aspx?prop_id=312669'
ses = requests.Session()
ses.get(url1)
soup = BeautifulSoup(ses.get(url).content)
print soup.prettify()

import urllib2
from cookielib import CookieJar

url1 = 'http://propaccess.traviscad.org/clientdb/?cid=1'
url = 'http://propaccess.traviscad.org/clientdb/Property.aspx?prop_id=312669'
cj = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
opener.open(url1)
soup = BeautifulSoup(opener.open(url).read())
print soup.prettify()

您需要先获取一些cookies，然后才能访问url。
虽然这可以通过

urllib2

和

CookieJar

实现，但我建议

请求

：

import requests
from BeautifulSoup import BeautifulSoup

url1 = 'http://propaccess.traviscad.org/clientdb/?cid=1'
url = 'http://propaccess.traviscad.org/clientdb/Property.aspx?prop_id=312669'
ses = requests.Session()
ses.get(url1)
soup = BeautifulSoup(ses.get(url).content)
print soup.prettify()

import urllib2
from cookielib import CookieJar

url1 = 'http://propaccess.traviscad.org/clientdb/?cid=1'
url = 'http://propaccess.traviscad.org/clientdb/Property.aspx?prop_id=312669'
cj = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
opener.open(url1)
soup = BeautifulSoup(opener.open(url).read())
print soup.prettify()

请注意，

请求

不是标准库，您必须将其全部插入。如果要使用

urllib2

：

import requests
from BeautifulSoup import BeautifulSoup

url1 = 'http://propaccess.traviscad.org/clientdb/?cid=1'
url = 'http://propaccess.traviscad.org/clientdb/Property.aspx?prop_id=312669'
ses = requests.Session()
ses.get(url1)
soup = BeautifulSoup(ses.get(url).content)
print soup.prettify()

import urllib2
from cookielib import CookieJar

url1 = 'http://propaccess.traviscad.org/clientdb/?cid=1'
url = 'http://propaccess.traviscad.org/clientdb/Property.aspx?prop_id=312669'
cj = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
opener.open(url1)
soup = BeautifulSoup(opener.open(url).read())
print soup.prettify()

从BeautifulSoup导入BeautifulSoup

不应该是bs4导入BeautifulSoup的

吗？@MD.Khairul Basar是的，我通常就是这样导入的，但它是双向的。你为什么要导入cookies。。我尝试过的其他示例从不需要Cookie除非您发送Cookie，否则该站点不会显示页面。如果您使用浏览器访问页面并删除cookies，您将得到与您在问题中发布的相同的html响应。也可以考虑使用<代码>请求< /代码>，这使得事情变得简单多了。（1）你是怎么认为这是Cookie问题的？（2） 您是如何计算cookie地址的？来自BeautifulSoup import BeautifulSoup
不应该是来自bs4 import BeautifulSoup

的

吗？@MD.Khairl Basar是的，这是我通常导入的方式，但它是双向运行的。为什么您必须导入cookie。。我尝试过的其他示例从不需要Cookie除非您发送Cookie，否则该站点不会显示页面。如果您使用浏览器访问页面并删除cookies，您将得到与您在问题中发布的相同的html响应。也可以考虑使用<代码>请求< /代码>，这使得事情变得简单多了。（1）你是怎么认为这是Cookie问题的？（2） 你怎么知道cookie的地址？