Python 将unicode转换为中文_Python_Python 2.7_Unicode

Python 将unicode转换为中文

python python-2.7 unicode

Python 将unicode转换为中文,python,python-2.7,unicode,Python,Python 2.7,Unicode,我正在尝试使用python从在线网站上获取一些中文文本。当我获取时，它被html标记包围，如下所示：我今天的心情不好。我今天心情不好。（我不得不将其作为代码来防止html标记消失）然而，一旦我使用切片来去除html标记，我得到：我今天的心情ﾸﾍ好。为什么这个奇怪的角色出现在倒数第二位？谢谢你的帮助使用模块，您可以使用过滤汉字： >>> text = u'''我今天的<em class="hot">心情</em>不好。<br/> I'

我正在尝试使用python从在线网站上获取一些中文文本。当我获取时，它被html标记包围，如下所示：

我今天的心情不好。
我今天心情不好。

（我不得不将其作为代码来防止html标记消失）然而，一旦我使用切片来去除html标记，我得到：

我今天的心情ﾸﾍ好。

为什么这个奇怪的角色出现在倒数第二位？谢谢你的帮助

使用模块，您可以使用过滤汉字：

>>> text = u'''我今天的<em class="hot">心情</em>不好。<br/> I'm feeling blue today.'''
>>> import regex
>>> print u''.join(regex.findall(r'\p{Han}+', text, flags=regex.UNICODE))
我今天的心情不好

>>text=u''我今天的心情不好。
我今天心情不好
>>>打印u“”。连接（如果unicodedata.name（c）.startswith（'CJK'），则文本中c代表c）
我今天的心情不好

>>> import unicodedata
>>> unicodedata.name(u'a')
'LATIN SMALL LETTER A'
>>> unicodedata.name(u'我')
'CJK UNIFIED IDEOGRAPH-6211'
>>> unicodedata.name(u'今')
'CJK UNIFIED IDEOGRAPH-4ECA'

>>> text = u'''我今天的<em class="hot">心情</em>不好。<br/> I'm feeling blue today.'''
>>> print u''.join(c for c in text if unicodedata.name(c).startswith('CJK'))
我今天的心情不好