为什么Python能够解析Amazon而不能解析Google/Reddit?

为什么Python能够解析Amazon而不能解析Google/Reddit?,python,python-2.7,web-scraping,beautifulsoup,python-requests,Python,Python 2.7,Web Scraping,Beautifulsoup,Python Requests,我找了一段时间没有结果。Python似乎能够处理一些(但不是所有)网页: import requests, webbrowser, bs4 res = requests.get('http://www.reddit.com') soup = bs4.BeautifulSoup(res.text, 'html.parser') print soup.prettify() 令人惊讶的是,这可以打印Amazon.com主页,但不能打印Reddit。我得到的错误是: Traceback (most r

我找了一段时间没有结果。Python似乎能够处理一些(但不是所有)网页:

import requests, webbrowser, bs4
res = requests.get('http://www.reddit.com')
soup = bs4.BeautifulSoup(res.text, 'html.parser')
print soup.prettify()
令人惊讶的是,这可以打印Amazon.com主页,但不能打印Reddit。我得到的错误是:

Traceback (most recent call last):File "testweb.py", line 7, in <module>
print soup.prettify()File "C:\PYTHON27\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)UnicodeEncodeError: 'charmap' codec can't encode character u'\xd7' in position 37769: character maps to <undefined>
我的问题是:我如何编写一个程序,可以为任何网页编码?我哪里做错了

编辑:进一步的测试显示google.com也不起作用。这是一条类似的错误消息:

Traceback (most recent call last):File "testweb.py", line 7, in <module>
print soup.prettify()File "C:\PYTHON27\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)UnicodeEncodeError: 'charmap' codec can't encode character u'\xa9' in position 9651: character maps to <undefined>
编辑2:尝试将res.text解码为utf-8,但出现以下错误:

Traceback (most recent call last):File "testweb.py", line 5, in <module>
soup = bs4.BeautifulSoup(res.text.decode('utf-8'), 'html.parser')File "C:\PYTHON27\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 9358: ordinal not in range(128)
Traceback (most recent call last):File "testweb.py", line 8, in <module>
print soup.prettify()File "C:\PYTHON27\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)UnicodeEncodeError: 'charmap' codec can't encode character u'\xa9' in position 9622: character maps to <undefined>
编辑3:尝试将res.text编码为utf-8,但出现以下错误:

Traceback (most recent call last):File "testweb.py", line 5, in <module>
soup = bs4.BeautifulSoup(res.text.decode('utf-8'), 'html.parser')File "C:\PYTHON27\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 9358: ordinal not in range(128)
Traceback (most recent call last):File "testweb.py", line 8, in <module>
print soup.prettify()File "C:\PYTHON27\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)UnicodeEncodeError: 'charmap' codec can't encode character u'\xa9' in position 9622: character maps to <undefined>

将输出编码更改为utf-8,这样它将输出utf-8编码的文本,并尝试对请求文本进行编码,而不是对其进行解码

例如:

# -*- coding: utf-8 -*-

import requests, webbrowser, bs4
res = requests.get('http://www.reddit.com')
soup = bs4.BeautifulSoup(res.text.encode('utf-8'), 'html.parser')
print (soup.prettify())
尝试直接在“美化”中编码:


打印浓汤。修饰“latin-1”或打印浓汤。修饰“utf-8”

您可以尝试将res.text解码为utf-8:res.text。解码“utf-8”刚刚尝试过,仍然出现错误:。编辑了这篇文章。仍然不起作用。使用此方法更新帖子。谢谢。请检查更新以验证它是否可以帮助您@肯德里克电视台