Python zipfile模块-zipfile.write（）文件，文件名中包含土耳其字符_Python_Filenames_Zipfile_Utf

Python zipfile模块-zipfile.write（）文件，文件名中包含土耳其字符

python

Python zipfile模块-zipfile.write（）文件，文件名中包含土耳其字符,python,filenames,zipfile,utf,Python,Filenames,Zipfile,Utf,在我的系统中有许多Word文档，我想使用Python模块zipfile压缩它们我发现了我的问题，但在我的系统中，有些文件的文件名中包含德语umlauts和土耳其语字符我采用了这样的方法，因此它可以处理文件名中的德语umlauts： def zipdir(path, ziph): for root, dirs, files in os.walk(path): for file in files: current_file = os.path.jo

在我的系统中有许多Word文档，我想使用Python模块

zipfile

压缩它们

我发现了我的问题，但在我的系统中，有些文件的文件名中包含德语umlauts和土耳其语字符
我采用了这样的方法，因此它可以处理文件名中的德语umlauts：

def zipdir(path, ziph): for root, dirs, files in os.walk(path): for file in files: current_file = os.path.join(root, file) print "Adding to archive -> file: "+str(current_file) try: #ziph.write(current_file.decode("cp1250")) #German umlauts ok, Turkish chars not ok ziph.write(current_file.encode("utf-8")) #both not ok #ziph.write(current_file.decode("utf-8")) #both not ok except Exception,ex: print "exception ---> "+str(ex) print repr(current_file) raise
不幸的是，我试图为土耳其语字符添加逻辑的尝试仍然没有成功，留下了一个问题，即每次文件名包含土耳其语字符时，代码都会打印一个异常，例如：

exception ---> [Error 123] Die Syntax f³r den Dateinamen, Verzeichnisnamen oder die Datentrõgerbezeichnung ist falsch: u'X:\\my\\path\\SomeTurk?shChar?shere.doc'
我试过几种字符串编码解码的方法，但都没有成功
有人能帮我吗？

我编辑了上面的代码以包含注释中提到的更改
现在显示以下错误：

... Adding to archive -> file: X:\\my\path\blabla I blabla.doc Adding to archive -> file: X:\my\path\bla bla³bla³bla³bla.doc exception ---> 'ascii' codec can't decode byte 0xfc in position 24: ordinal not in range(128) 'X:\\my\\path\\bla B\xfcbla\xfcbla\xfcbla.doc' Traceback (most recent call last): File "Backup.py", line 48, in <module> zipdir('X:\\my\\path', zipf) File "Backup.py", line 12, in zipdir ziph.write(current_file.encode("utf-8")) UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 24: ordinal not in range(128)

。。。添加到存档->文件：X:\\my\path\blabla I blablabla.doc 添加到存档->文件：X:\my\path\bla bla³bla³bla³bla.doc 异常-->“ascii”编解码器无法解码位置24处的字节0xfc：序号不范围内（128） 'X:\\my\\path\\bla B\xfcbla\xfcbla\xfcbla\xfcbla.doc' 回溯（最近一次呼叫最后一次）：文件“Backup.py”，第48行，在 zipdir（'X:\\my\\path'，zipf） zipdir中第12行的文件“Backup.py” 写入（当前文件编码（“utf-8”）） UnicodeDecodeError:“ascii”编解码器无法解码位置24:ordinal中的字节0xfc 不在范围内（128）

³
实际上是一个德语
ü

编辑在尝试了评论中的建议之后，我无法找到解决方案
因此，我转而使用Groovy编程语言并使用它的Zip功能

由于这是一个基于观点的讨论，我决定投票支持关闭该线程。
如果您以后不需要使用任何归档程序检查ZIP文件，您可以始终将其编码为base64，然后在使用Python提取时还原它们
对任何档案管理员来说，这些文件名看起来像胡言乱语，但编码将被保留
无论如何，要获得0-128 ASCII范围字符串（或Py3中的字节对象），必须进行编码（），而不是解码（）
encode（）将unicode（）字符串序列化为ASCII范围

>>> u"\u0161blah".encode("utf-8") '\xc5\xa1blah'
decode（）从该值返回unicode（）：
任何其他代码页也是如此
很抱歉强调这一点，但人们有时会对编码和解码的东西感到困惑
如果您需要文件，但不太关心保存UMLAUTE和其他符号，则可以使用：

u"üsdlakui".encode("utf-8", "replace")
或：
这将用可能的字符替换未知字符，或完全忽略任何解码/编码错误
如果引发的错误类似于UnicodeDecodeError:无法解码字符，那么这将解决问题
但是，问题在于文件名只包含非拉丁字符
现在，一些可能真正起作用的东西：
那么
势必引发“ASCII编码错误”，因为字符串中没有定义unicode字符，而使用了othervise应用于描述unicode/UTF-8字符的非拉丁字符，但定义为ASCII-文件本身不是UTF-8编码的
而：

# -*- coding: UTF-8 -*- u'Sömethüng'.encode("utf-8")
或
在文件顶部定义编码并保存为UTF-8编码的情况下，应该可以工作
是的，您确实有来自OS（文件名）的字符串，但从一开始这就是一个问题
即使编码正确，ZIP问题仍有待解决
根据规范，ZIP应该使用CP437存储文件名，但这种情况很少发生
大多数架构师使用默认的OS编码（Python中的MBCS）
而且大多数归档程序不支持UTF-8。所以，我在这里提出的建议应该有效，但不是对所有的档案管理员都有效
要告诉ZIP归档程序归档使用的是UTF-8文件名，标志位的第十一位应设置为True。正如我所说，他们中的一些人没有检查这一点。这是ZIP规范中的最新内容（好吧，几年前真的）
我不会在这里写完整的代码，只是需要理解的部分

# -*- coding: utf-8 -*- # Cannot hurt to have default encoding set to UTF-8 all the time. :D import os, time, zipfile zip = zipfile.ZipFile(...) # Careful here, origname is the full path to the file you will store into ZIP # filename is the filename under which the file will be stored in the ZIP # It'll probably be better if filename is not a full path, but relative, not to introduce problems when extracting. You decide. filename = origname = os.path.join(root, filename) # Filenames from OS can be already UTF-8, but they can be a local codepage. # I will use MBCS here to decode from it, so that we can encode to UTF-8 later. # I recommend getting codepage from OS (from kernel32.dll on Windows) manually instead of using MBCS, but for now: if isinstance(filename, str): filename = filename.decode("mbcs") # Else, assume it is already a decoded unicode string. # Prepare the filename for archive: filename = os.path.normpath(os.path.splitdrive(filename)[1]) while filename[0] in (os.sep, os.altsep): filename = filename[1:] filename = filename.replace(os.sep, "/") filename = filename.encode("utf-8") # Get what we need zinfo = zipfile.ZipInfo(filename, time.localtime(os.getmtime(origname))[0:6]) # Here you should set zinfo.external_attr to store Unix permission bits and set the zinfo.compression_type # Both are optional and not a subject to your problem. But just as notice. zinfo.flag_bits |= 0x800 # Set 11th bit to 1, announce the UTF-8 filenames. f = open(origname, "rb") zip.writestr(zinfo, f.read()) f.close()
我没有测试它，只是写了一段代码，但这是一个想法，即使在某个地方出现了一些bug

如果这不起作用，我不知道会发生什么。
当我执行
ziph.write（当前的\u文件编码（“utf-8”）
时，德语的umlauts和土耳其语字符会导致异常，与
ziph.write（当前的\u文件解码（“utf-8”）
一样。请参阅我的编辑，并向我们提供对代码所做更改的输出：try:ziph.write（当前文件。编码（“utf-8”）；除了：打印报告（当前文件）；提高；当然，注意缩进和换行：d抱歉，我迟到了。我没有看到你的编辑，因为我的收件箱中没有任何内容。现在我想我为你找到了一个解决方案。希望它能起作用。祝你好运！我会尽快尝试。
'Sömethüng'.encode("utf-8")

# -*- coding: UTF-8 -*- u'Sömethüng'.encode("utf-8")

# -*- coding: UTF-8 -*- unicode('Sömethüng').encode("utf-8")

# -*- coding: utf-8 -*- # Cannot hurt to have default encoding set to UTF-8 all the time. :D import os, time, zipfile zip = zipfile.ZipFile(...) # Careful here, origname is the full path to the file you will store into ZIP # filename is the filename under which the file will be stored in the ZIP # It'll probably be better if filename is not a full path, but relative, not to introduce problems when extracting. You decide. filename = origname = os.path.join(root, filename) # Filenames from OS can be already UTF-8, but they can be a local codepage. # I will use MBCS here to decode from it, so that we can encode to UTF-8 later. # I recommend getting codepage from OS (from kernel32.dll on Windows) manually instead of using MBCS, but for now: if isinstance(filename, str): filename = filename.decode("mbcs") # Else, assume it is already a decoded unicode string. # Prepare the filename for archive: filename = os.path.normpath(os.path.splitdrive(filename)[1]) while filename[0] in (os.sep, os.altsep): filename = filename[1:] filename = filename.replace(os.sep, "/") filename = filename.encode("utf-8") # Get what we need zinfo = zipfile.ZipInfo(filename, time.localtime(os.getmtime(origname))[0:6]) # Here you should set zinfo.external_attr to store Unix permission bits and set the zinfo.compression_type # Both are optional and not a subject to your problem. But just as notice. zinfo.flag_bits |= 0x800 # Set 11th bit to 1, announce the UTF-8 filenames. f = open(origname, "rb") zip.writestr(zinfo, f.read()) f.close()