Python HTTP错误403:禁止使用urlretrieve_Python_Http_Python Requests_Urllib

Python HTTP错误403:禁止使用urlretrieve

python http

Python HTTP错误403:禁止使用urlretrieve,python,http,python-requests,urllib,Python,Http,Python Requests,Urllib,我试图下载一个PDF，但我得到以下错误：HTTP错误403：禁止我知道服务器因任何原因而阻塞，但我似乎找不到解决方案 import urllib.request import urllib.parse import requests def download_pdf(url): full_name = "Test.pdf" urllib.request.urlretrieve(url, full_name) try: url = ('http://papers.xt

我试图下载一个PDF，但我得到以下错误：HTTP错误403：禁止

我知道服务器因任何原因而阻塞，但我似乎找不到解决方案

import urllib.request
import urllib.parse
import requests


def download_pdf(url):

full_name = "Test.pdf"
urllib.request.urlretrieve(url, full_name)


try: 
url =         ('http://papers.xtremepapers.com/CIE/Cambridge%20IGCSE/Mathematics%20(0580)/0580_s03_qp_1.pdf')

print('initialized')

hdr = {}
hdr = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2)     AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36',
'Content-Length': '136963',
}



print('HDR recieved')

req = urllib.request.Request(url, headers=hdr)

print('Header sent')

resp = urllib.request.urlopen(req)

print('Request sent')

respData = resp.read()

download_pdf(url)


print('Complete')

except Exception as e:
print(str(e))

你似乎已经意识到这一点；远程服务器显然正在检查用户代理头并拒绝来自Python的urllib的请求。但是

urllib.request.urlretrieve（）

不允许您更改HTTP头，但是，您可以使用：

注意：您使用的是Python3，这些函数现在被认为是的一部分，

URLopener

已被弃用。因此，您不应该在新代码中使用它们

除此之外，简单地访问URL会给您带来很多麻烦。您的代码导入，但您不使用它-您应该这样做，因为它比

urllib

容易得多。这对我很有用：

import requests

url = 'http://papers.xtremepapers.com/CIE/Cambridge%20IGCSE/Mathematics%20(0580)/0580_s03_qp_1.pdf'
r = requests.get(url)
with open('0580_s03_qp_1.pdf', 'wb') as outfile:
    outfile.write(r.content)

如果服务器阻塞，可能没有一种简单的方法可以通过。禁止意味着你不被允许。While的可能重复是一个很好的观点-它不能解释403错误的原因。问题不是问原因。

import requests

url = 'http://papers.xtremepapers.com/CIE/Cambridge%20IGCSE/Mathematics%20(0580)/0580_s03_qp_1.pdf'
r = requests.get(url)
with open('0580_s03_qp_1.pdf', 'wb') as outfile:
    outfile.write(r.content)