Python UnicodeDecodeError:&x27；utf8'；编解码器可以'；t解码位置21中的字节0x80：无效的起始字节_Python_Beautifulsoup

Python UnicodeDecodeError:&x27；utf8'；编解码器可以'；t解码位置21中的字节0x80：无效的起始字节

python

Python UnicodeDecodeError:&x27；utf8'；编解码器可以'；t解码位置21中的字节0x80：无效的起始字节,python,beautifulsoup,Python,Beautifulsoup,我正在使用BeautifulSoup获得一篇文章但是收到错误 UnicodeDecodeError:“utf8”编解码器无法解码位置107中的字节0x80:无效的开始字节我曾尝试使用soup.encode（['windows-1252'、'ascii'、'iso-8859']），但首先，甚至无法创建soup 有人有什么建议可以分享吗错误回溯（如果有帮助）： Traceback (most recent call last): File "<pyshell#17>", line

我正在使用BeautifulSoup获得一篇文章

但是收到错误 UnicodeDecodeError:“utf8”编解码器无法解码位置107中的字节0x80:无效的开始字节

我曾尝试使用soup.encode（['windows-1252'、'ascii'、'iso-8859']），但首先，甚至无法创建soup

有人有什么建议可以分享吗

错误回溯（如果有帮助）：

Traceback (most recent call last):
File "<pyshell#17>", line 1, in <module>
parseReuters()
File "C:\Users\name\Desktop\test.py", line 39, in parseReuters
soup = BeautifulSoup(source)
File "C:\Python27\lib\site-packages\bs4\__init__.py", line 172, in __init__
self._feed()
File "C:\Python27\lib\site-packages\bs4\__init__.py", line 185, in _feed
self.builder.feed(self.markup)
File "C:\Python27\lib\site-packages\bs4\builder\_lxml.py", line 195, in feed
self.parser.close()
File "parser.pxi", line 1209, in lxml.etree._FeedParser.close (src\lxml\lxml.etree.c:90597)
File "parsertarget.pxi", line 142, in lxml.etree._TargetParserContext._handleParseResult (src\lxml\lxml.etree.c:99984)
File "parsertarget.pxi", line 130, in lxml.etree._TargetParserContext._handleParseResult (src\lxml\lxml.etree.c:99807)
File "lxml.etree.pyx", line 294, in lxml.etree._ExceptionContext._raise_if_stored (src\lxml\lxml.etree.c:9383)
File "saxparser.pxi", line 259, in lxml.etree._handleSaxData (src\lxml\lxml.etree.c:95945)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: invalid start byte

回溯（最近一次呼叫最后一次）：
文件“”，第1行，在
路透社（
文件“C:\Users\name\Desktop\test.py”，第39行，在parseReuters中
汤=美汤（来源）
文件“C:\Python27\lib\site packages\bs4\\uuuuu init\uuuuu.py”，第172行，在\uuuu init中__
self._feed（）
文件“C:\Python27\lib\site packages\bs4\\uuuu init\uuuuu.py”，第185行，在\u提要中
self.builder.feed（self.markup）
文件“C:\Python27\lib\site packages\bs4\builder\\u lxml.py”，第195行，在提要中
self.parser.close（）
lxml.etree.\u FeedParser.close（src\lxml\lxml.etree.c:90597）中的文件“parser.pxi”，第1209行
lxml.etree.\u TargetParserContext.\u handleParseResult（src\lxml\lxml.etree.c:99984）中第142行的文件“parsertarget.pxi”
lxml.etree.\u TargetParserContext.\u handleParseResult（src\lxml\lxml.etree.c:99807）中第130行的文件“parsertarget.pxi”
文件“lxml.etree.pyx”，第294行，在lxml.etree.\u ExceptionContext.\u如果存储，则引发（src\lxml\lxml.etree.c:9383）
lxml.etree.\u handleSaxData（src\lxml\lxml.etree.c:95945）中的文件“saxparser.pxi”，第259行
UnicodeDecodeError:“utf8”编解码器无法解码位置0中的字节0x80:无效的开始字节

有趣的是，当我下载该URL时，位置107处的字符是一个空格（0x20）。我也没有发现数据有任何问题（使用

iconv

进行验证）。你是否可能点击任何操纵数据或将你重定向到另一个页面的代理？我不知道如何检查是否有代理。你能告诉我怎么做吗？使用chardet（检测编码），我发现编码是ascii码。我在另一个网站（esciencenews）上使用了BeautifulSoup，它工作得很好。您在哪一条语句中得到错误？在将其传递给beautiful soup之前是否尝试打印

source

？错误发生在soup=BeautifulSoup（source）处。我还保存了网页的源代码，但出现了相同的错误（在不同的位置UnicodeDecodeError:“utf8”编解码器无法解码位置40:无效起始字节中的字节0xf9），我似乎根本没有收到错误。您是否安装了

lxml

？

Traceback (most recent call last):
File "<pyshell#17>", line 1, in <module>
parseReuters()
File "C:\Users\name\Desktop\test.py", line 39, in parseReuters
soup = BeautifulSoup(source)
File "C:\Python27\lib\site-packages\bs4\__init__.py", line 172, in __init__
self._feed()
File "C:\Python27\lib\site-packages\bs4\__init__.py", line 185, in _feed
self.builder.feed(self.markup)
File "C:\Python27\lib\site-packages\bs4\builder\_lxml.py", line 195, in feed
self.parser.close()
File "parser.pxi", line 1209, in lxml.etree._FeedParser.close (src\lxml\lxml.etree.c:90597)
File "parsertarget.pxi", line 142, in lxml.etree._TargetParserContext._handleParseResult (src\lxml\lxml.etree.c:99984)
File "parsertarget.pxi", line 130, in lxml.etree._TargetParserContext._handleParseResult (src\lxml\lxml.etree.c:99807)
File "lxml.etree.pyx", line 294, in lxml.etree._ExceptionContext._raise_if_stored (src\lxml\lxml.etree.c:9383)
File "saxparser.pxi", line 259, in lxml.etree._handleSaxData (src\lxml\lxml.etree.c:95945)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: invalid start byte