Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/362.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 连接到第页(错误403)_Python - Fatal编程技术网

Python 连接到第页(错误403)

Python 连接到第页(错误403),python,Python,我无法连接到page。这是我的代码和错误: from urllib.request import Request, urlopen from urllib.error import URLError, HTTPError import urllib someurl = "https://www.genecards.org/cgi-bin/carddisp.pl?gene=MET" req = Request(someurl) try: response = urllib.reques

我无法连接到page。这是我的代码和错误:

from urllib.request import Request, urlopen
from urllib.error import URLError, HTTPError
import urllib

someurl = "https://www.genecards.org/cgi-bin/carddisp.pl?gene=MET"
req = Request(someurl)

try:
    response = urllib.request.urlopen(req)
except HTTPError as e:
    print('The server couldn\'t fulfill the request.')
    print('Error code: ', e.code)
except URLError as e:
    print('We failed to reach a server.')
    print('Reason: ', e.reason)
else:
    print("Everything is fine")
错误代码:403


您可以使用
http.client
。首先,您需要打开与服务器的连接。然后,发出GET请求。像这样:

import http.client



conn = http.client.HTTPConnection("genecards.org:80")
conn.request("GET", "/cgi-bin/carddisp.pl?gene=MET")

try:
    response = conn.getresponse().read().decode("UTF-8")
except HTTPError as e:
    print('The server couldn\'t fulfill the request.')
    print('Error code: ', e.code)
except URLError as e:
    print('We failed to reach a server.')
    print('Reason: ', e.reason)
else:
    print("Everything is fine")

一些网站需要类似“用户代理”的浏览器标题,其他网站则需要特定的cookie。在本例中,我通过反复试验发现这两种方法都是必需的。您需要做的是:

  • 使用类似浏览器的用户代理发送初始请求。403将失败,但您也将在响应中获得有效的cookie
  • 使用之前获得的相同用户代理和cookie发送第二个请求
  • 代码:

    import urllib.request
    from urllib.error import URLError
    
    # This handler will store and send cookies for us.
    handler = urllib.request.HTTPCookieProcessor()
    opener = urllib.request.build_opener(handler)
    # Browser-like user agent to make the website happy.
    headers = {'User-Agent': 'Mozilla/5.0'}
    url = 'https://www.genecards.org/cgi-bin/carddisp.pl?gene=MET'
    request = urllib.request.Request(url, headers=headers)
    
    for i in range(2):
        try:
            response = opener.open(request)
        except URLError as exc:
            print(exc)
    
    print(response)
    
    # Output:
    # HTTP Error 403: Forbidden  (expected, first request always fails)
    # <http.client.HTTPResponse object at 0x...>  (correct 200 response)
    
    导入urllib.request
    从urllib.error导入URLLERROR
    #该处理程序将为我们存储和发送cookie。
    handler=urllib.request.HTTPCookieProcessor()
    opener=urllib.request.build\u opener(处理程序)
    #类似浏览器的用户代理,让网站快乐。
    headers={'User-Agent':'Mozilla/5.0'}
    url='1〕https://www.genecards.org/cgi-bin/carddisp.pl?gene=MET'
    request=urllib.request.request(url,headers=headers)
    对于范围(2)中的i:
    尝试:
    响应=opener.open(请求)
    除URLError作为exc外:
    打印(exc)
    打印(答复)
    #输出:
    #HTTP错误403:禁止(预期,第一个请求总是失败)
    #(正确答案为200)
    
    或者,如果您愿意,使用:

    导入请求
    会话=请求。会话()
    jar=requests.cookies.requestScookeJar()
    headers={'User-Agent':'Mozilla/5.0'}
    url='1〕https://www.genecards.org/cgi-bin/carddisp.pl?gene=MET'
    对于范围(2)中的i:
    response=session.get(url,cookies=jar,headers=headers)
    打印(答复)
    #输出:
    # 
    # 
    
    这意味着对页面的访问受到限制(),这不会引起任何异常,因为HTTPclipse不认为4xx是错误。不过,该响应将不是有效的200响应
    import requests
    
    session = requests.Session()
    jar = requests.cookies.RequestsCookieJar()
    headers = {'User-Agent': 'Mozilla/5.0'}
    url = 'https://www.genecards.org/cgi-bin/carddisp.pl?gene=MET'
    
    for i in range(2):
        response = session.get(url, cookies=jar, headers=headers)
        print(response)
    
    # Output:
    # <Response [403]>
    # <Response [200]>