Python 2.7 原始输入unicode字符串

Python 2.7 原始输入unicode字符串,python-2.7,Python 2.7,我已经不止一次地阅读了“Python2.7上的unicode操作”并彻底浏览了这个论坛,但是我没有发现并尝试过任何东西可以让我的程序正常工作 它应该将dictionary.com条目转换成例句集和单词发音对。然而,它在一开始就失败了:输入IPA(即unicode)字符后,会立即转换成乱码 # -*- coding: utf-8 -*- """ HERE'S HOW A TYPICAL DICTIONARY.COM ENTRY LOOKS LIKE white·wash /ˈʰwaɪtˌwɒʃ,

我已经不止一次地阅读了“Python2.7上的unicode操作”并彻底浏览了这个论坛,但是我没有发现并尝试过任何东西可以让我的程序正常工作

它应该将dictionary.com条目转换成例句集和单词发音对。然而,它在一开始就失败了:输入IPA(即unicode)字符后,会立即转换成乱码

# -*- coding: utf-8 -*-

""" HERE'S HOW A TYPICAL DICTIONARY.COM ENTRY LOOKS LIKE
white·wash
/ˈʰwaɪtˌwɒʃ, -ˌwɔʃ, ˈwaɪt-/ Show Spelled
noun
1.
a composition, as of lime and water or of whiting, size, and water, used for whitening walls, woodwork, etc.
2.
anything, as deceptive words or actions, used to cover up or gloss over faults, errors, or wrongdoings, or absolve a wrongdoer from blame.
3.
Sports Informal. a defeat in which the loser fails to score.
verb (used with object)
4.
to whiten with whitewash.
5.
to cover up or gloss over the faults or errors of; absolve from blame.
6.
Sports Informal. to defeat by keeping the opponent from scoring: The home team whitewashed the visitors eight to nothing.
"""

def wdefinp():   #word definition input
    wdef=u''
    emptylines=0 
    print '\nREADY\n\n'
    while True:
        cinp=raw_input()   #current input line
        if cinp=='':
            emptylines += 1
            if emptylines >= 3:   #breaking out by 3xEnter
                wdef=wdef[:-2]
                return wdef
        else:
            emptylines = 0
        wdef=wdef + '\n' + cinp
    return wdef

wdef=wdefinp()
print wdef.decode('utf-8')
这将产生: 白洗 /Ë�ĘwŞtËwĘwĘ�, -ËŚwÉĘ�, Ë�waŞt-/显示拼写


任何帮助都将不胜感激

好的,我设法在你的程序中复制了几个错误

首先,如果我在终端中运行它并粘贴示例文本,我会在这一行中得到一个错误(很抱歉,我的行号与您的行号不匹配):

为了解决这个问题,我使用了这个问题的答案:

固定线路是

cinp = raw_input().decode(sys.stdin.encoding)
基本上你需要知道输入编码,然后转换成utf8是可能的

一旦解决了这个问题,下一个问题就是类似的问题

File "unicod.py", line 28, in <module>
    print wdef.decode('utf-8')
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 6: ordinal not in range(128)
文件“unicd.py”,第28行,在
打印wdef.decode('utf-8')
文件“/usr/lib/python2.7/encodings/utf_8.py”,第16行,解码
返回编解码器.utf_8_解码(输入,错误,真)
UnicodeEncodeError:“ascii”编解码器无法对位置6中的字符u'\xb7'进行编码:序号不在范围内(128)

因为从函数返回的数据已经是utf8“双重解码”,所以无法工作。只需删除“
.decode('utf8')
”,它就可以很好地运行了

,在eclipse、python 2.7和您的测试数据中运行
File "unicod.py", line 28, in <module>
    print wdef.decode('utf-8')
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 6: ordinal not in range(128)