Python 3.x 无法通过BeautifulSoup读取wiki页面_Python 3.x_Beautifulsoup_Character Encoding_Urllib

Python 3.x 无法通过BeautifulSoup读取wiki页面

python-3.x character-encoding

Python 3.x 无法通过BeautifulSoup读取wiki页面,python-3.x,beautifulsoup,character-encoding,urllib,Python 3.x,Beautifulsoup,Character Encoding,Urllib,我尝试使用urllib和BeautifulSoup阅读wiki页面，如下所示我试着按照这个 import urllib.parse as parse, urllib.request as request from bs4 import BeautifulSoup name = "メインページ" root = 'https://ja.wikipedia.org/wiki/' url = root + parse.quote_plus(name) response = request.urlo

我尝试使用urllib和BeautifulSoup阅读wiki页面，如下所示

我试着按照这个

import urllib.parse as parse, urllib.request as request
from bs4 import BeautifulSoup

name = "メインページ"
root = 'https://ja.wikipedia.org/wiki/'
url = root + parse.quote_plus(name)

response = request.urlopen(url)
html = response.read()
print (html)

soup = BeautifulSoup(html.decode('UTF-8'), features="lxml")
print (soup)

代码运行时没有错误，但无法读取日文字符。

您的方法似乎是正确的，适合我。尝试使用以下代码打印解析后的数据并检查输出

soup = BeautifulSoup(html.decode('UTF-8'), features="lxml")
some_japanese = soup.find('div', {'id': 'mw-content-text'}).text.strip()
print(some_japanese)

在我的情况下，我得到以下结果，这是输出的一部分-

ウィリアム・バトラー・イェイツ（1865年6.月13日 - 1939年1.月28日）は、アイルランドの詩人・劇作家。幼少のころから親しんだアイルランドの妖精譚などを題材とする抒情詩で注目されたのち、民族演劇運動を通じてアイルランド文芸復興の担い手となった。……

如果这对您不起作用，请尝试将html内容保存到文件中，并在浏览器中检查页面，以确定日文文本是否正确获取。同样，这对我来说很好

无法读取日文字符-您确定打印字符的不仅仅是您的标准件无法读取日文字符吗？您尝试过吗将结果写入文件？