Python 3.x Python编码的字符串仍然是二进制格式_Python 3.x_Character Encoding_Beautifulsoup_Urllib3

Python 3.x Python编码的字符串仍然是二进制格式

python-3.x character-encoding

Python 3.x Python编码的字符串仍然是二进制格式,python-3.x,character-encoding,beautifulsoup,urllib3,Python 3.x,Character Encoding,Beautifulsoup,Urllib3,我正在尝试一些网站刮用urllib3和美丽的汤。Python3编码/解码让我大吃一惊。这是我的密码 r = http.request('GET', 'https://www.************************.jsf') if(r.status == 200): page = r.data.decode('utf-8') soup = BeautifulSoup(page) print(soup.prettify()) #This print

我正在尝试一些网站刮用urllib3和美丽的汤。Python3编码/解码让我大吃一惊。这是我的密码

r = http.request('GET', 'https://www.************************.jsf')

if(r.status == 200):
    page = r.data.decode('utf-8')
    soup = BeautifulSoup(page)  

    print(soup.prettify())
    #This prints - [Decode error - output not utf-8]
    #              [Decode error - output not utf-8]

    print(soup.prettify().encode('utf-8'))
    #This prints the data but with binary mark
    # b'<!DOCTYPE html PUBLIC "-//W3C//D.......
    #..........................................'

r=http.request（'GET'，'https://www.************************.jsf'）
如果（r.status==200）：
page=r.data.decode（'utf-8'）
汤=美汤（第页）
打印（soup.prettify（））
#这将打印-[解码错误-输出不是utf-8]
#[解码错误-输出不是utf-8]
打印（soup.prettify（）.encode（'utf-8'））
#这会打印带有二进制标记的数据
#b'xxx'
是二进制类型值的表示形式（字节序列——这是.encode（）
的自然结果。print（）
函数会自动将对象转换为其表示形式（如果它不是字符串）
尝试将调试信息写入文件。print
函数在输出到支持某些字符集/编码的控制台时可能会遇到问题。
未记录data
属性。为什么也没有getresponse（）
call？我正在readthedocs上阅读urllib3文档。找不到对getresponse（）的任何引用。此外，数据属性几乎是访问响应内容的唯一方法。https://urllib3.readthedocs.org/en/latest/index.html
。也许我遗漏了什么。