Python six.text_类型是否与text.decode（'；utf8'；）相同？_Python_Text_Unicode_Six

Python six.text_类型是否与text.decode（'；utf8'；）相同？

python text unicode

Python six.text_类型是否与text.decode（'；utf8'；）相同？,python,text,unicode,six,Python,Text,Unicode,Six,给定如下函数： import six def convert_to_unicode(text): """Converts `text` to Unicode (if it's not already), assuming utf-8 input.""" if six.PY3: if isinstance(text, str): return text elif isinstance(text, bytes): return text.decode

给定如下函数：

import six

def convert_to_unicode(text):
  """Converts `text` to Unicode (if it's not already), assuming utf-8 input."""
  if six.PY3:
    if isinstance(text, str):
      return text
    elif isinstance(text, bytes):
      return text.decode("utf-8", "ignore")
    else:
      raise ValueError("Unsupported string type: %s" % (type(text)))
  elif six.PY2:
    if isinstance(text, str):
      return text.decode("utf-8", "ignore")
    elif isinstance(text, unicode):
      return text
    else:
      raise ValueError("Unsupported string type: %s" % (type(text)))
  else:
    raise ValueError("Not running on Python2 or Python 3?")

由于

six

处理python2和python3的兼容性，上述
convert_to_unicode（text）
函数是否只相当于
six.text_type（text）
？即

def convert_to_unicode(text):
    return six.text_type(text)

是否存在原始

转换为\u unicode

捕获但

six.text\u type

无法转换的情况？

因为

six.text\u type

只是对

str

或

unicode

类型的引用，等效函数如下：

def convert_to_unicode(text):
    return six.text_type(text, encoding='utf8', errors='ignore')

但在拐角处的情况下，它的行为并不相同，例如，它只是很高兴地转换成一个整数，因此您必须先在那里进行一些检查

另外，我不明白您为什么希望出现

错误class='ignore'

。你说你假设UTF-8。但如果违反了这一假设，您就是在默默地删除数据。我强烈建议使用

errors='strict'

编辑：我刚刚意识到，如果

文本

已经是您想要的内容，那么这是行不通的。此外，它还为任何非字符串输入引发TypeError。那么这个呢：

def convert_to_unicode(text):
    if isinstance(text, six.text_type):
        return text
    return six.text_type(text, encoding='utf8', errors='ignore')

这里唯一没有提到的一种情况是Python版本既不是2也不是3。

我仍然认为你应该使用

errors='strict'

注意：

six

版本1.12有

six。确保我想它正是你需要的。@alvas，我的答案中有什么你遗漏的吗？想解释一下为什么这涵盖了你发布的函数中的所有情况吗？