如何在Python中编码/解码这个美丽的字符串，以便输出非标准拉丁字符？_Python_Utf 8_Beautifulsoup_Character Encoding

如何在Python中编码/解码这个美丽的字符串，以便输出非标准拉丁字符？

python utf-8 character-encoding

如何在Python中编码/解码这个美丽的字符串，以便输出非标准拉丁字符？,python,utf-8,beautifulsoup,character-encoding,Python,Utf 8,Beautifulsoup,Character Encoding,我正在用漂亮的汤刮一个页面，输出包含非标准拉丁字符，显示为十六进制我在刮。它包含使用非标准拉丁字符（例如ǎ、ā）的拼音单词。我一直在尝试通过一系列包含拼音的链接进行循环，使用BeautifulSoup.string函数和utf-8编码来输出这些单词。这个词在非标准字符的位置用十六进制表示。“hǎo”一词的意思是“h\xc7\x8eo”。我肯定我在编码它时出错了，但我不知道该怎么解决。我首先尝试使用utf-8进行解码，但我得到一个错误，即元素没有解码功能。试图在不编码的情况下打印字符串会导致字符

我正在用漂亮的汤刮一个页面，输出包含非标准拉丁字符，显示为十六进制

我在刮。它包含使用非标准拉丁字符（例如ǎ、ā）的拼音单词。我一直在尝试通过一系列包含拼音的链接进行循环，使用BeautifulSoup.string函数和utf-8编码来输出这些单词。这个词在非标准字符的位置用十六进制表示。“hǎo”一词的意思是“h\xc7\x8eo”。我肯定我在编码它时出错了，但我不知道该怎么解决。我首先尝试使用utf-8进行解码，但我得到一个错误，即元素没有解码功能。试图在不编码的情况下打印字符串会导致字符未定义的错误，我认为这是因为需要先将字符编码为某种内容

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import re

url = "https://www.archchinese.com/"

driver = webdriver.Chrome() #Set selenium up for opening page with Chrome.
driver.implicitly_wait(30)
driver.get(url)

driver.find_element_by_id('dictSearch').send_keys('好') # This character is hǎo.

python_button = driver.find_element_by_id('dictSearchBtn')
python_button.click() # Look for submit button and click it.

soup=BeautifulSoup(driver.page_source, 'lxml')

div = soup.find(id='charDef') # Find div with the target links.

for a in div.find_all('a', attrs={'class': 'arch-pinyin-font'}):
    print (a.string.encode('utf-8')) # Loop through all links with pinyin and attempt to encode.

实际结果： b'h\xc7\x8eo' b'h\xc3\xa0o'

预期成果：霍哈奥

编辑：问题似乎与Windows中的UnicodeEncodeError有关。我试图安装

win unicode console

，但没有成功。感谢snakecharmerb提供的信息。

打印时无需对值进行编码-打印功能将自动处理此问题。现在，您正在打印组成编码值的字节的表示形式，而不仅仅是字符串本身

>>> s = 'hǎo'
>>> print(s)
hǎo

>>> print(s.encode('utf-8'))
b'h\xc7\x8eo'

在调用BeautifulSoup时使用encode，而不是在调用之后

soup=BeautifulSoup(driver.page_source.encode('utf-8'), 'lxml')

div = soup.find(id='charDef') # Find div with the target links.

for a in div.find_all('a', attrs={'class': 'arch-pinyin-font'}):
    print (a.string)

尝试打印（a）时没有编码，得到的结果与打印.string时没有编码的结果相同：回溯（最近一次调用）：文件“hanziscrape.py”，第22行，在打印（a）文件“C:\Users\root\AppData\Local\Programs\Python\37\lib\encodings\cp1252.py”，第19行，在encode return codecs.charmap\u encode中（输入、自身错误、编码表）[0]UnicodeEncodeError:“charmap”编解码器无法对177位置的字符“\u01ce”进行编码：字符映射到良好的旧窗口。可能会有所帮助。是的，我以前遇到过这种情况，并通过pip安装了win unicode控制台。我再次尝试，得到了C:\Users\root>pip安装win unicode控制台的要求已满足：win unicode控制台在c:\users\root\appdata\local\programs\python37\lib\site包（0.5）中我手头没有windows box，因此无法进一步提供帮助。但我建议编辑您的问题，以明确您的问题是在打印到windows控制台时出现的

UnicodeEncodeError

，以及您尝试解决此问题所采取的步骤。事实证明，我在windows上使用的是Git控制台，这是一个因素。您的建议很有效。