Encoding 从英语重命名时无法通过Python Cgi脚本打开文件=>；希腊文字_Encoding

Encoding 从英语重命名时无法通过Python Cgi脚本打开文件=>；希腊文字

encoding

Encoding 从英语重命名时无法通过Python Cgi脚本打开文件=>；希腊文字,encoding,Encoding,在Linux系统上，文件系统存储字节，并且仅存储字节所以，如果一个程序认为它应该发送文件名， UTF-16或ISO-8859-7编码，它将采用类似“Νικόλαος”的字符串文件系统将看到如下字节： py> s = 'Νικόλαος' py> s.encode('UTF-16be') b'\x03\x9d\x03\xb9\x03\xba\x03\xcc\x03\xbb\x03\xb1\x03\xbf\x03\xc2' py> s.encode('iso-8859-

在Linux系统上，文件系统存储字节，并且仅存储字节

所以，如果一个程序认为它应该发送文件名，

UTF-16

或

ISO-8859-7

编码，它将采用类似“Νικόλαος”的字符串文件系统将看到如下字节：

py> s = 'Νικόλαος' 
py> s.encode('UTF-16be') 
b'\x03\x9d\x03\xb9\x03\xba\x03\xcc\x03\xbb\x03\xb1\x03\xbf\x03\xc2' 

py> s.encode('iso-8859-7') 
b'\xcd\xe9\xea\xfc\xeb\xe1\xef\xf2'

请注意，相同的字符串提供了完全不同的字节。及同样，相同的字节将根据不同的情况提供不同的字符串您使用的编码

现在，如果您尝试使用期望UTF-8的程序读取文件名，它要么看到某种mojibake垃圾字符，要么得到一些错误类型：

py> s.encode('UTF-16be').decode('utf-8') 
Traceback (most recent call last): 
  File "<stdin>", line 1, in <module> 
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9d in position 1: 
invalid start byte 

py> s.encode('iso-8859-7').decode('utf-8') 
Traceback (most recent call last): 
  File "<stdin>", line 1, in <module> 
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcd in position 0: 
invalid continuation byte

那么，需要如何编写以下代码才能正确地让files.py读取希腊文件名呢

# Compute a set of current fullpaths 
fullpaths = set() 
path = "/home/nikos/public_html/data/apps/" 

for root, dirs, files in os.walk(path): 
    for fullpath in files: 
            fullpaths.add( os.path.join(root, fullpath) ) 

# Load'em 
for fullpath in fullpaths: 
    try: 
            # Check the presence of a file against the database and insert if it doesn't exist 
            cur.execute('''SELECT url FROM files WHERE url = %s''', (fullpath,) ) 
            data = cur.fetchone()

编码=字符串->（使用了一些字符集）->字符集字节解码=字节->（必须知道使用了什么字符集）->原始字符串

我是否正确理解了，整个编码/解码过程的关键是使用/必须使用的字符集

我们不知道它们使用了密钥（字符集），但我们知道字符串的原始形式，因此我想到，如果我们编写一个python脚本，将mojabike ByTestStream解码为所有可用的字符集，那么在某个时候，原始字符串会出现

你能正确格式化你的帖子吗？以目前的形式，它很难阅读。另外，越短越好。file命令是否有助于确定文件名的实际编码？e、 g.

ls$directory | file-

，假设$directory中的所有文件名都有相同的编码。我不知道如何格式化它，我只知道按Ctrl-K正确粘贴代码。ls-l |文件-outputs/dev/stdin:ASCII文本

# Compute a set of current fullpaths 
fullpaths = set() 
path = "/home/nikos/public_html/data/apps/" 

for root, dirs, files in os.walk(path): 
    for fullpath in files: 
            fullpaths.add( os.path.join(root, fullpath) ) 

# Load'em 
for fullpath in fullpaths: 
    try: 
            # Check the presence of a file against the database and insert if it doesn't exist 
            cur.execute('''SELECT url FROM files WHERE url = %s''', (fullpath,) ) 
            data = cur.fetchone()