Redirect python urllib.request与我的浏览器获取的html不同_Redirect_Python 3.x_User Agent_Urllib

Redirect python urllib.request与我的浏览器获取的html不同

redirect python-3.x

Redirect python urllib.request与我的浏览器获取的html不同,redirect,python-3.x,user-agent,urllib,Redirect,Python 3.x,User Agent,Urllib,尝试使用以下python代码获取的html代码： import urllib.request url="http://groupon.cl/descuentos/santiago-centro" request = urllib.request.Request(url, headers = {'user-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}) response = urllib.request.urlopen(req

尝试使用以下python代码获取的html代码：

import urllib.request
url="http://groupon.cl/descuentos/santiago-centro"
request = urllib.request.Request(url, headers = {'user-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'})
response = urllib.request.urlopen(request)
return response.read().decode('utf-8')

我正在获取一个页面的html代码，该页面询问我的位置。如果我用我的浏览器手动打开同一个链接（没有涉及cookies，即使是最近安装的浏览器），我会直接转到带有折扣促销的页面。似乎有些重定向操作没有针对urllib执行。我正在使用用户代理头来尝试获取典型浏览器的行为，但我没有运气

如何获得与浏览器相同的html代码？

我认为您可以运行以下命令：

wget -d http://groupon.cl/descuentos/santiago-centro

您将看到wget打印两个http请求并将响应页面保存到一个文件中

 -   HTTP/1.1 302 Moved Temporarily
 -   HTTP/1.1 200 OK

文件的内容是你想要的html代码

第一个响应代码是302，因此执行第二个请求。但事实并非如此设置从第一个响应中获得的正确cookie，服务器无法接受第二个请求，这样您就可以获得另一个页面

http.client模块自己不处理301或302 http响应

import http

conn = http.client.HTTPConnection("groupon.cl")
#do first request
conn.request("GET", "/descuentos/santiago-centro")
print(conn.status)  # 301 or 302
print(conn.getheaders()) # set-Cookie

#get the cookie
headers = ....
#do second request

conn.requesst("GET", "/", headers)
......
......
#Get response page.