Python urllib2编码问题_Python_Encoding_Urllib2

Python urllib2编码问题

python encoding

Python urllib2编码问题,python,encoding,urllib2,Python,Encoding,Urllib2,这是我的示例脚本： import urllib2, re response = urllib2.urlopen('http://domain.tld/file') data = response.read() # Normally displays "the emoticon <3 is blah blah" pattern = re.search('(the emoticon )(.*)( is blah blah)', data) result = pattern

这是我的示例脚本：

import urllib2, re

response = urllib2.urlopen('http://domain.tld/file')
data     = response.read() # Normally displays "the emoticon <3 is blah blah"

pattern   = re.search('(the emoticon )(.*)( is blah blah)', data)
result    = pattern.group(2) # result should contain "<3" now

print 'The result is ' + result # prints "&lt;3" because not encoded

导入urllib2，重新
response=urllib2.urlopen（'http://domain.tld/file')
data=response.read（）#通常显示“表情符号”尝试以下操作：
>>> import HTMLParser
>>> h = HTMLParser.HTMLParser()
>>> h.unescape('wer&amp;wer')
u'wer&wer'

您可能想看一看。@Lattyware看了看，没有看到太多帮助，因为我不想使用外部模块来实现这一点。