Python unicode相等比较在终端中失败，但在Spyder编辑器下工作_Python_Unicode_Utf 8_Spyder

Python unicode相等比较在终端中失败，但在Spyder编辑器下工作

python unicode utf-8

Python unicode相等比较在终端中失败，但在Spyder编辑器下工作,python,unicode,utf-8,spyder,Python,Unicode,Utf 8,Spyder,我需要将来自utf-8文件的unicode字符串与Python脚本中定义的常量进行比较我正在Linux上使用Python 2.7.6 如果我在Spyder（一个Python编辑器）中运行上述脚本，我会让它工作，但是如果我从终端调用Python脚本，我会让测试失败。在调用脚本之前，是否需要在终端中导入/定义某些内容脚本（“pythonscript.py”）： #!/usr/bin/env python # -*- coding: utf-8 -*- import csv some_fren

我需要将来自utf-8文件的unicode字符串与Python脚本中定义的常量进行比较

我正在Linux上使用Python 2.7.6

如果我在Spyder（一个Python编辑器）中运行上述脚本，我会让它工作，但是如果我从终端调用Python脚本，我会让测试失败。在调用脚本之前，是否需要在终端中导入/定义某些内容

脚本（“pythonscript.py”）：

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import csv

some_french_deps = []
idata_raw = csv.DictReader(open("utf8_encoded_data.csv", 'rb'), delimiter=";")
for rec in idata_raw:
    depname = unicode(rec['DEP'],'utf-8')
    some_french_deps.append(depname)

test1 = "Tarn"
test2 = "Rhône-Alpes"
if test1==some_french_deps[0]:
  print "Tarn test passed"
else:
  print "Tarn test failed"
if test2==some_french_deps[2]:
  print "Rhône-Alpes test passed"
else:
  print "Rhône-Alpes test failed"

DEP
Tarn
Lozère
Rhône-Alpes
Aude

Tarn test passed
Rhône-Alpes test passed

$ ./pythonscript.py 
Tarn test passed
./pythonscript.py:20: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  if test2==some_french_deps[2]:
Rhône-Alpes test failed

utf8编码的数据。csv:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import csv

some_french_deps = []
idata_raw = csv.DictReader(open("utf8_encoded_data.csv", 'rb'), delimiter=";")
for rec in idata_raw:
    depname = unicode(rec['DEP'],'utf-8')
    some_french_deps.append(depname)

test1 = "Tarn"
test2 = "Rhône-Alpes"
if test1==some_french_deps[0]:
  print "Tarn test passed"
else:
  print "Tarn test failed"
if test2==some_french_deps[2]:
  print "Rhône-Alpes test passed"
else:
  print "Rhône-Alpes test failed"

DEP
Tarn
Lozère
Rhône-Alpes
Aude

Tarn test passed
Rhône-Alpes test passed

$ ./pythonscript.py 
Tarn test passed
./pythonscript.py:20: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  if test2==some_french_deps[2]:
Rhône-Alpes test failed

从Spyder编辑器运行输出：

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import csv

some_french_deps = []
idata_raw = csv.DictReader(open("utf8_encoded_data.csv", 'rb'), delimiter=";")
for rec in idata_raw:
    depname = unicode(rec['DEP'],'utf-8')
    some_french_deps.append(depname)

test1 = "Tarn"
test2 = "Rhône-Alpes"
if test1==some_french_deps[0]:
  print "Tarn test passed"
else:
  print "Tarn test failed"
if test2==some_french_deps[2]:
  print "Rhône-Alpes test passed"
else:
  print "Rhône-Alpes test failed"

DEP
Tarn
Lozère
Rhône-Alpes
Aude

Tarn test passed
Rhône-Alpes test passed

$ ./pythonscript.py 
Tarn test passed
./pythonscript.py:20: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  if test2==some_french_deps[2]:
Rhône-Alpes test failed

从终端运行输出：

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import csv

some_french_deps = []
idata_raw = csv.DictReader(open("utf8_encoded_data.csv", 'rb'), delimiter=";")
for rec in idata_raw:
    depname = unicode(rec['DEP'],'utf-8')
    some_french_deps.append(depname)

test1 = "Tarn"
test2 = "Rhône-Alpes"
if test1==some_french_deps[0]:
  print "Tarn test passed"
else:
  print "Tarn test failed"
if test2==some_french_deps[2]:
  print "Rhône-Alpes test passed"
else:
  print "Rhône-Alpes test failed"

DEP
Tarn
Lozère
Rhône-Alpes
Aude

Tarn test passed
Rhône-Alpes test passed

$ ./pythonscript.py 
Tarn test passed
./pythonscript.py:20: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  if test2==some_french_deps[2]:
Rhône-Alpes test failed

您正在将字节字符串（类型

str

）与

unicode

值进行比较。Spyder将默认编码从ASCII更改为UTF-8，Python在比较这两种类型时在字节字符串和

unicode

值之间进行隐式转换。字节字符串编码为UTF-8，因此在Spyder下比较成功

解决方案是不使用字节字符串，而是对两个测试值使用

unicode

文本：

test1 = u"Tarn"
test2 = u"Rhône-Alpes"

在我看来，更改系统默认编码是一个糟糕的想法。您的代码应该正确使用Unicode，而不是依赖隐式转换，但是更改隐式转换规则只会增加混乱，而不会使任务变得更容易。

当您将字符串对象与Unicode对象进行比较时，python会抛出此警告

要解决此问题，您可以编写

test1 = "Tarn"
test2 = "Rhône-Alpes"

作为

其中“u”表示它是一个unicode对象。

仅使用

depname=rec['DEP']

应该可以工作，因为您已经声明了编码

如果你

打印一些法语deps[2]

它会打印

罗恩阿尔卑斯山

，这样你的比较就行了。

Gah，Spyder做了各种事情来打破Python环境的正常。在这种情况下，我强烈怀疑默认的隐式转换编码已更改。

locale

从bash中显示了什么？@padraiccningham:locale对Python在Unicode和字节字符串之间强制执行的方式没有影响。@MartijnPieters，是的，我最初误解了这个问题。如果已经声明了编码，我认为没有必要使用u“Tarn”，只是比较应该有效还是我遗漏了什么？@padraiccanningham：编解码器只告诉Python如何解释换行符以及如何解码Unicode文本的字节。声明编解码器时，字节字符串文字不会自动解码为Unicode值，否。