为什么Python能够解析Amazon而不能解析Google/Reddit？_Python_Python 2.7_Web Scraping_Beautifulsoup_Python Requests

为什么Python能够解析Amazon而不能解析Google/Reddit？

python python-2.7 web-scraping

为什么Python能够解析Amazon而不能解析Google/Reddit？,python,python-2.7,web-scraping,beautifulsoup,python-requests,Python,Python 2.7,Web Scraping,Beautifulsoup,Python Requests,我找了一段时间没有结果。Python似乎能够处理一些（但不是所有）网页： import requests, webbrowser, bs4 res = requests.get('http://www.reddit.com') soup = bs4.BeautifulSoup(res.text, 'html.parser') print soup.prettify() 令人惊讶的是，这可以打印Amazon.com主页，但不能打印Reddit。我得到的错误是： Traceback (most r

我找了一段时间没有结果。Python似乎能够处理一些（但不是所有）网页：

import requests, webbrowser, bs4
res = requests.get('http://www.reddit.com')
soup = bs4.BeautifulSoup(res.text, 'html.parser')
print soup.prettify()

令人惊讶的是，这可以打印Amazon.com主页，但不能打印Reddit。我得到的错误是：

Traceback (most recent call last):File "testweb.py", line 7, in <module>
print soup.prettify()File "C:\PYTHON27\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)UnicodeEncodeError: 'charmap' codec can't encode character u'\xd7' in position 37769: character maps to <undefined>

我的问题是：我如何编写一个程序，可以为任何网页编码？我哪里做错了

编辑：进一步的测试显示google.com也不起作用。这是一条类似的错误消息：

Traceback (most recent call last):File "testweb.py", line 7, in <module>
print soup.prettify()File "C:\PYTHON27\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)UnicodeEncodeError: 'charmap' codec can't encode character u'\xa9' in position 9651: character maps to <undefined>

编辑2：尝试将res.text解码为utf-8，但出现以下错误：

Traceback (most recent call last):File "testweb.py", line 5, in <module>
soup = bs4.BeautifulSoup(res.text.decode('utf-8'), 'html.parser')File "C:\PYTHON27\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 9358: ordinal not in range(128)

Traceback (most recent call last):File "testweb.py", line 8, in <module>
print soup.prettify()File "C:\PYTHON27\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)UnicodeEncodeError: 'charmap' codec can't encode character u'\xa9' in position 9622: character maps to <undefined>

编辑3:尝试将res.text编码为utf-8，但出现以下错误：

Traceback (most recent call last):File "testweb.py", line 5, in <module>
soup = bs4.BeautifulSoup(res.text.decode('utf-8'), 'html.parser')File "C:\PYTHON27\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 9358: ordinal not in range(128)

Traceback (most recent call last):File "testweb.py", line 8, in <module>
print soup.prettify()File "C:\PYTHON27\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)UnicodeEncodeError: 'charmap' codec can't encode character u'\xa9' in position 9622: character maps to <undefined>

将输出编码更改为utf-8，这样它将输出utf-8编码的文本，并尝试对请求文本进行编码，而不是对其进行解码

例如：

# -*- coding: utf-8 -*-

import requests, webbrowser, bs4
res = requests.get('http://www.reddit.com')
soup = bs4.BeautifulSoup(res.text.encode('utf-8'), 'html.parser')
print (soup.prettify())

尝试直接在“美化”中编码：

打印浓汤。修饰“latin-1”或打印浓汤。修饰“utf-8”

您可以尝试将res.text解码为utf-8:res.text。解码“utf-8”刚刚尝试过，仍然出现错误：。编辑了这篇文章。仍然不起作用。使用此方法更新帖子。谢谢。请检查更新以验证它是否可以帮助您@肯德里克电视台