Python 如何正确解码混乱的UTF-8字符串？_Python_String_Utf 8_Exif_Iptc

Python 如何正确解码混乱的UTF-8字符串？

python string utf-8

Python 如何正确解码混乱的UTF-8字符串？,python,string,utf-8,exif,iptc,Python,String,Utf 8,Exif,Iptc,我正在尝试使用python和pyexiv2读取IPTC数据 import pyexiv2 image = pyexiv2.Image('test.jpg') image.readMetadata() print image['Iptc.Application2.Caption'] 这给了我以下信息： Copyright: Michael Huebner, Kontakt: +4915100000000xxxxxx Höxx (30) ist im Streit mit dem Arbeitsa

我正在尝试使用python和pyexiv2读取IPTC数据

import pyexiv2
image = pyexiv2.Image('test.jpg')
image.readMetadata()
print image['Iptc.Application2.Caption']

这给了我以下信息：

Copyright: Michael Huebner, Kontakt: +4915100000000xxxxxx Höxx (30) ist im Streit mit dem Arbeitsamt in Brandenburg, xxxxxxxxxxxxxx , xxxxxx,

但它应该给我：

Kinder: Axxxxx Hxxxxx (10) und Exxxxxx Höxx (5), Rxxxxxxx Höxx (30) ist im Streit mit dem Arbeitsamt in Brandenburg, xxxxxxxxxxxxx , xxxxxxxxxxx, 
Copyright: Michael Huebner, Kontakt: +4915100000000

这有点混乱，因为我必须删除个人数据，但你可以看到发生了什么：“换行符”使最后一部分覆盖字符串的第一部分

但现在它变得很奇怪：

for i in str(image['Iptc.Application2.Caption']):
  print i,

它只是按照正确的顺序打印出所有字符，包括换行符。但它弄乱了“乌姆劳特”的角色

这：

给我：

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 41: ordinal not in range(128)

那么，我如何同时拥有Umlaut和正确的字符串顺序呢？如何修复此字符串？

您的数据使用了与预期不同的行分隔符约定。这不是UTF-8的具体问题，真的

您可以使用

str.splitlines（）

拆分行；它将把

\r

识别为行分隔符。或者，您可以使用

\n

重新连接行：

>>> sample = 'line 1\rline 2'
>>> print sample
line 2
>>> sample.splitlines()
['line 1', 'line 2']
>>> print '\n'.join(sample.splitlines())
line 1
line 2

如果需要在输出中包含换行符，请不要使用块引号。

print repr（image['Iptc.Application2.Caption']）

show在字符串中有什么作用？您可能有一个

\r

在里面，一个回车。是的，它是\r。我怎样才能解决这个问题？

>>> sample = 'line 1\rline 2'
>>> print sample
line 2
>>> sample.splitlines()
['line 1', 'line 2']
>>> print '\n'.join(sample.splitlines())
line 1
line 2