Python2.7编码与解码_Python_Encoding_Compare

Python2.7编码与解码

python encoding

Python2.7编码与解码,python,encoding,compare,Python,Encoding,Compare,我有一个编码/解码方面的问题。我从文件中读取文本，并将其与数据库中的文本进行比较（Postgres）比较是在两个列表中完成的从文件中我得到“jo\x9a”表示“još”，从数据库中我得到“jo\xc5\xa1”表示相同的值 common = [a for a in codes_from_file if a in kode_prfoksov] # Items in one but not the other only1 = [a for a in codes_from_file if no

我有一个编码/解码方面的问题。我从文件中读取文本，并将其与数据库中的文本进行比较（Postgres）比较是在两个列表中完成的

从文件中我得到“jo\x9a”表示“još”，从数据库中我得到“jo\xc5\xa1”表示相同的值

common = [a for a in codes_from_file if a in kode_prfoksov]

# Items in one but not the other
only1 = [a for a in codes_from_file if not a in kode_prfoksov]

#Items only in another
only2 = [a for a in kode_prfoksov if not a in codes_from_file ]

如何解决这个问题？比较这两个字符串以解决问题时，应设置哪种编码

谢谢

第一个似乎是

windows-1250

，第二个是

utf-8

>>> print 'jo\x9a'.decode('windows-1250')
još
>>> print 'jo\xc5\xa1'.decode('utf-8')
još
>>> 'jo\x9a'.decode('windows-1250') == 'jo\xc5\xa1'.decode('utf-8')
True

您的文件字符串似乎是Windows-1250编码的。您的数据库似乎包含UTF-8字符串

因此，您可以先将所有字符串转换为unicode：

codes_from_file = [a.decode("windows-1250") for a in codes_from_file]
kode_prfoksov]  = [a.decode("utf-8") for a in codes_from_file]

或者，如果不需要unicode字符串，只需将文件字符串转换为UTF-8：

codes_from_file = [a.decode("windows-1250").encode("utf-8") for a in codes_from_file]

OP来自斯洛文尼亚，因此Windows-1250比Windows-1252更安全。