Python 修改urllib2.urlopen返回的页面内容_Python_Urllib2

Python 修改urllib2.urlopen返回的页面内容

python

Python 修改urllib2.urlopen返回的页面内容,python,urllib2,Python,Urllib2,我有一个简单的Python代理： import SocketServer, SimpleHTTPServer, urllib, re PORT = 80 class Proxy(SimpleHTTPServer.SimpleHTTPRequestHandler): def do_GET(self): page = urllib.urlopen(self.path) self.copyfile(page, self.wfile) httpd = Soc

我有一个简单的Python代理：

import SocketServer, SimpleHTTPServer, urllib, re

PORT = 80

class Proxy(SimpleHTTPServer.SimpleHTTPRequestHandler):
    def do_GET(self):
        page = urllib.urlopen(self.path)
        self.copyfile(page, self.wfile)

httpd = SocketServer.ForkingTCPServer(('', PORT), Proxy)
print "serving at port", PORT
httpd.serve_forever()

这正如预期的那样有效。但是我对

urlopen

返回类型有问题

如果我这样修改类：

class Proxy(SimpleHTTPServer.SimpleHTTPRequestHandler):
    def do_GET(self):
        page = urllib.urlopen(self.path)
        print page.read()                      # NEW LINE
        self.copyfile(page, self.wfile)

我可以成功打印页面的html内容，但是

页面

为空（将一个空白转发给客户端）

我不明白为什么

.read（）

会清空

文件类型对象

为了解决这个问题，我尝试重新编写内容：

content = page.read()
print page.read()
page.write(content)

但显然，这个文件类型对象没有

写入方法
如何读取/写入此文件类型对象，并且仍然向客户端返回有效页面？
可以使用一些整数调用文件对象上的读取
方法，它将读取（并前进指针）那么多字节。read
在没有参数的情况下所做的是在EOF之前读取数据。如果您执行了file.tell，您将看到它现在指向文件中有许多字节的位置。如果要重置文件，可以执行file.seek（0）
。但更好的设计可能是：
data = file.read()
print data
self.copyfile(data, self.wfile)

我认为问题在于，一旦urllib.urlopen中的page.read（）
完成，self.copyfile（page，self.wfile）
不再侦听要写入self.wfile
的新输入
您需要做的是直接将数据写入self.wfile
，而不是尝试背载/重定向另一个IO流
因此，不是：
content = page.read()
print page.read()
page.write(content)

你想要：
content = page.read()
print page.read()
self.wfile.write(content)

+1谢谢！关于如何写入文件有什么想法吗？例如一个被禁止的网站，我只想返回“你看不到这个”。事实上，你的方法是无效的。该对象中没有seek方法，而您的其他解决方案也有相同的问题。然后数据为空。