Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/322.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python UnicodeDecodeError:&x27;utf8';编解码器可以';t解码位置21中的字节0x80:无效的起始字节_Python_Beautifulsoup - Fatal编程技术网

Python UnicodeDecodeError:&x27;utf8';编解码器可以';t解码位置21中的字节0x80:无效的起始字节

Python UnicodeDecodeError:&x27;utf8';编解码器可以';t解码位置21中的字节0x80:无效的起始字节,python,beautifulsoup,Python,Beautifulsoup,我正在使用BeautifulSoup获得一篇文章 但是收到错误 UnicodeDecodeError:“utf8”编解码器无法解码位置107中的字节0x80:无效的开始字节 我曾尝试使用soup.encode(['windows-1252'、'ascii'、'iso-8859']),但首先,甚至无法创建soup 有人有什么建议可以分享吗 错误回溯(如果有帮助): Traceback (most recent call last): File "<pyshell#17>", line

我正在使用BeautifulSoup获得一篇文章

但是收到错误 UnicodeDecodeError:“utf8”编解码器无法解码位置107中的字节0x80:无效的开始字节

我曾尝试使用soup.encode(['windows-1252'、'ascii'、'iso-8859']),但首先,甚至无法创建soup

有人有什么建议可以分享吗

错误回溯(如果有帮助):

Traceback (most recent call last):
File "<pyshell#17>", line 1, in <module>
parseReuters()
File "C:\Users\name\Desktop\test.py", line 39, in parseReuters
soup = BeautifulSoup(source)
File "C:\Python27\lib\site-packages\bs4\__init__.py", line 172, in __init__
self._feed()
File "C:\Python27\lib\site-packages\bs4\__init__.py", line 185, in _feed
self.builder.feed(self.markup)
File "C:\Python27\lib\site-packages\bs4\builder\_lxml.py", line 195, in feed
self.parser.close()
File "parser.pxi", line 1209, in lxml.etree._FeedParser.close (src\lxml\lxml.etree.c:90597)
File "parsertarget.pxi", line 142, in lxml.etree._TargetParserContext._handleParseResult (src\lxml\lxml.etree.c:99984)
File "parsertarget.pxi", line 130, in lxml.etree._TargetParserContext._handleParseResult (src\lxml\lxml.etree.c:99807)
File "lxml.etree.pyx", line 294, in lxml.etree._ExceptionContext._raise_if_stored (src\lxml\lxml.etree.c:9383)
File "saxparser.pxi", line 259, in lxml.etree._handleSaxData (src\lxml\lxml.etree.c:95945)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: invalid start byte
回溯(最近一次呼叫最后一次):
文件“”,第1行,在
路透社(
文件“C:\Users\name\Desktop\test.py”,第39行,在parseReuters中
汤=美汤(来源)
文件“C:\Python27\lib\site packages\bs4\\uuuuu init\uuuuu.py”,第172行,在\uuuu init中__
self._feed()
文件“C:\Python27\lib\site packages\bs4\\uuuu init\uuuuu.py”,第185行,在\u提要中
self.builder.feed(self.markup)
文件“C:\Python27\lib\site packages\bs4\builder\\u lxml.py”,第195行,在提要中
self.parser.close()
lxml.etree.\u FeedParser.close(src\lxml\lxml.etree.c:90597)中的文件“parser.pxi”,第1209行
lxml.etree.\u TargetParserContext.\u handleParseResult(src\lxml\lxml.etree.c:99984)中第142行的文件“parsertarget.pxi”
lxml.etree.\u TargetParserContext.\u handleParseResult(src\lxml\lxml.etree.c:99807)中第130行的文件“parsertarget.pxi”
文件“lxml.etree.pyx”,第294行,在lxml.etree.\u ExceptionContext.\u如果存储,则引发(src\lxml\lxml.etree.c:9383)
lxml.etree.\u handleSaxData(src\lxml\lxml.etree.c:95945)中的文件“saxparser.pxi”,第259行
UnicodeDecodeError:“utf8”编解码器无法解码位置0中的字节0x80:无效的开始字节

有趣的是,当我下载该URL时,位置107处的字符是一个空格(0x20)。我也没有发现数据有任何问题(使用
iconv
进行验证)。你是否可能点击任何操纵数据或将你重定向到另一个页面的代理?我不知道如何检查是否有代理。你能告诉我怎么做吗?使用chardet(检测编码),我发现编码是ascii码。我在另一个网站(esciencenews)上使用了BeautifulSoup,它工作得很好。您在哪一条语句中得到错误?在将其传递给beautiful soup之前是否尝试打印
source
?错误发生在soup=BeautifulSoup(source)处。我还保存了网页的源代码,但出现了相同的错误(在不同的位置UnicodeDecodeError:“utf8”编解码器无法解码位置40:无效起始字节中的字节0xf9),我似乎根本没有收到错误。您是否安装了
lxml
Traceback (most recent call last):
File "<pyshell#17>", line 1, in <module>
parseReuters()
File "C:\Users\name\Desktop\test.py", line 39, in parseReuters
soup = BeautifulSoup(source)
File "C:\Python27\lib\site-packages\bs4\__init__.py", line 172, in __init__
self._feed()
File "C:\Python27\lib\site-packages\bs4\__init__.py", line 185, in _feed
self.builder.feed(self.markup)
File "C:\Python27\lib\site-packages\bs4\builder\_lxml.py", line 195, in feed
self.parser.close()
File "parser.pxi", line 1209, in lxml.etree._FeedParser.close (src\lxml\lxml.etree.c:90597)
File "parsertarget.pxi", line 142, in lxml.etree._TargetParserContext._handleParseResult (src\lxml\lxml.etree.c:99984)
File "parsertarget.pxi", line 130, in lxml.etree._TargetParserContext._handleParseResult (src\lxml\lxml.etree.c:99807)
File "lxml.etree.pyx", line 294, in lxml.etree._ExceptionContext._raise_if_stored (src\lxml\lxml.etree.c:9383)
File "saxparser.pxi", line 259, in lxml.etree._handleSaxData (src\lxml\lxml.etree.c:95945)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: invalid start byte