Python 3.x UnicodeEncodeError：使用Python3和beautifulsoup4的crawel web_Python 3.x_Beautifulsoup_Web Crawler

Python 3.x UnicodeEncodeError：使用Python3和beautifulsoup4的crawel web

python-3.x web-crawler

Python 3.x UnicodeEncodeError：使用Python3和beautifulsoup4的crawel web,python-3.x,beautifulsoup,web-crawler,Python 3.x,Beautifulsoup,Web Crawler,我的代码： from urllib.request import urlopen from bs4 import BeautifulSoup import lxml html = urlopen("http://www.xyafc.edu.cn/xyacnews/cnews/") news = BeautifulSoup(html,'lxml') print(news.title.encode('utf8')) 结果是： b'<title>\xe6\xa0\xa1\xe5\x9b\

我的代码：

from urllib.request import urlopen
from bs4 import BeautifulSoup
import lxml
html = urlopen("http://www.xyafc.edu.cn/xyacnews/cnews/")
news = BeautifulSoup(html,'lxml')
print(news.title.encode('utf8'))

结果是：

b'<title>\xe6\xa0\xa1\xe5\x9b\xad\xe6\x96\xb0\xe9\x97\xbb</title>'

b'\xe6\xa0\xa1\xe5\x9b\xad\xe6\x96\xb0\xe9\x97\xbb'

网站

页面的字符集为gb2312。我在互联网上搜索答案，但这些都不起作用。如何获得正确的

新闻.标题

？

如果要更改html的编码，请首先在urlopen中进行更改比

encode

意味着str>>字节，这就是你打印出

b'..'

的方式

只要去掉编码即可。

首先，当您想更改html的编码时，请在urlopen中进行比

encode

意味着str>>字节，这就是你打印出

b'..'

的方式

只要去掉编码。

如果页面使用

gb2312

那么为什么要使用“utf-8”？为什么不打印而不编码？我使用

print（news.title）

和get

校园新闻（LinuxMint，带有utf-8
的Bash控制台）顺便问一句：当您在没有的情况下打印时会得到什么。编码（'utf8'）
？若字符串不正确，那个么问题可能不是字符串，而是使用不同编码的控制台。Windows主要使用cp125x
（代码页），也称为win-125x
谢谢！你是对的！我得到了校园新闻在我的bash中。（mac10.12）。错误的答案是在atom runner上运行！一些控制台没有通知Python控制台使用了什么编码，然后print（）
就无法正确编码。如果页面使用gb2312
，那么为什么要使用“utf-8”？为什么不打印而不编码？我使用print（news.title）
和get校园新闻（LinuxMint，带有utf-8
的Bash控制台）顺便问一句：当您在没有的情况下打印时会得到什么。编码（'utf8'）
？若字符串不正确，那个么问题可能不是字符串，而是使用不同编码的控制台。Windows主要使用cp125x
（代码页），也称为win-125x
谢谢！你是对的！我得到了校园新闻在我的bash中。（mac10.12）。错误的答案是在atom runner上运行！一些控制台没有通知Python控制台使用了什么编码，然后print（）
就无法正确编码。