Python 连接到第页(错误403)
我无法连接到page。这是我的代码和错误:Python 连接到第页(错误403),python,Python,我无法连接到page。这是我的代码和错误: from urllib.request import Request, urlopen from urllib.error import URLError, HTTPError import urllib someurl = "https://www.genecards.org/cgi-bin/carddisp.pl?gene=MET" req = Request(someurl) try: response = urllib.reques
from urllib.request import Request, urlopen
from urllib.error import URLError, HTTPError
import urllib
someurl = "https://www.genecards.org/cgi-bin/carddisp.pl?gene=MET"
req = Request(someurl)
try:
response = urllib.request.urlopen(req)
except HTTPError as e:
print('The server couldn\'t fulfill the request.')
print('Error code: ', e.code)
except URLError as e:
print('We failed to reach a server.')
print('Reason: ', e.reason)
else:
print("Everything is fine")
错误代码:403
您可以使用
http.client
。首先,您需要打开与服务器的连接。然后,发出GET请求。像这样:
import http.client
conn = http.client.HTTPConnection("genecards.org:80")
conn.request("GET", "/cgi-bin/carddisp.pl?gene=MET")
try:
response = conn.getresponse().read().decode("UTF-8")
except HTTPError as e:
print('The server couldn\'t fulfill the request.')
print('Error code: ', e.code)
except URLError as e:
print('We failed to reach a server.')
print('Reason: ', e.reason)
else:
print("Everything is fine")
一些网站需要类似“用户代理”的浏览器标题,其他网站则需要特定的cookie。在本例中,我通过反复试验发现这两种方法都是必需的。您需要做的是:
import urllib.request
from urllib.error import URLError
# This handler will store and send cookies for us.
handler = urllib.request.HTTPCookieProcessor()
opener = urllib.request.build_opener(handler)
# Browser-like user agent to make the website happy.
headers = {'User-Agent': 'Mozilla/5.0'}
url = 'https://www.genecards.org/cgi-bin/carddisp.pl?gene=MET'
request = urllib.request.Request(url, headers=headers)
for i in range(2):
try:
response = opener.open(request)
except URLError as exc:
print(exc)
print(response)
# Output:
# HTTP Error 403: Forbidden (expected, first request always fails)
# <http.client.HTTPResponse object at 0x...> (correct 200 response)
导入urllib.request
从urllib.error导入URLLERROR
#该处理程序将为我们存储和发送cookie。
handler=urllib.request.HTTPCookieProcessor()
opener=urllib.request.build\u opener(处理程序)
#类似浏览器的用户代理,让网站快乐。
headers={'User-Agent':'Mozilla/5.0'}
url='1〕https://www.genecards.org/cgi-bin/carddisp.pl?gene=MET'
request=urllib.request.request(url,headers=headers)
对于范围(2)中的i:
尝试:
响应=opener.open(请求)
除URLError作为exc外:
打印(exc)
打印(答复)
#输出:
#HTTP错误403:禁止(预期,第一个请求总是失败)
#(正确答案为200)
或者,如果您愿意,使用:
导入请求
会话=请求。会话()
jar=requests.cookies.requestScookeJar()
headers={'User-Agent':'Mozilla/5.0'}
url='1〕https://www.genecards.org/cgi-bin/carddisp.pl?gene=MET'
对于范围(2)中的i:
response=session.get(url,cookies=jar,headers=headers)
打印(答复)
#输出:
#
#
这意味着对页面的访问受到限制(),这不会引起任何异常,因为HTTPclipse不认为4xx是错误。不过,该响应将不是有效的200响应
import requests
session = requests.Session()
jar = requests.cookies.RequestsCookieJar()
headers = {'User-Agent': 'Mozilla/5.0'}
url = 'https://www.genecards.org/cgi-bin/carddisp.pl?gene=MET'
for i in range(2):
response = session.get(url, cookies=jar, headers=headers)
print(response)
# Output:
# <Response [403]>
# <Response [200]>