Python Popen命令（antiword）在shell和web应用程序中生成不同的输出_Python_Character Encoding_Popen

Python Popen命令（antiword）在shell和web应用程序中生成不同的输出

python character-encoding

Python Popen命令（antiword）在shell和web应用程序中生成不同的输出,python,character-encoding,popen,Python,Character Encoding,Popen,我让Django在标准WSGI/ApacheHTTPD组合上运行我注意到，当我在shell中运行代码与从浏览器运行代码时，文件输出是不同的。我已经排除了所有其他问题，仍然遇到同样的问题代码如下： def test_antiword(filename): import subprocess with open(filename, 'w') as writefile: subprocess.Popen(["antiword", '/tmp/test.doc'],

我让Django在标准WSGI/ApacheHTTPD组合上运行

我注意到，当我在shell中运行代码与从浏览器运行代码时，文件输出是不同的。我已经排除了所有其他问题，仍然遇到同样的问题

代码如下：

def test_antiword(filename):
    import subprocess
    with open(filename, 'w') as writefile:
        subprocess.Popen(["antiword", '/tmp/test.doc'], stdout=writefile)
    p = subprocess.Popen(["antiword", '/tmp/test.doc'], stdout=subprocess.PIPE)
    out, _ = p.communicate()
    ords = []
    for kk in out:
        ords.append(ord(kk))
    return out, ords

def test_antiword_view(request):
    import HttpResponse
    return HttpResponse(repr(test_antiword('/tmp/web.txt')))

在浏览器中打开url时，这是输出：

（'\n“我说了再见，先生。再见！”Sh\xe9rlo\xe7k H\xf8lme\xa3大声喊道。\n\n“佐伊德伯格为什么不呢？”佐伊德伯格问道。\n'，[10, 34, 73, 32, 115, 97, 105, 100, 32, 103, 111, 111, 100, 32, 100, 97, 121, 32, 115, 105, 114, 46, 32, 71, 111, 111, 100, 32, 100, 97, 121, 33, 34, 32, 115, 104, 111, 117, 116, 101, 100, 32, 83, 104, 233, 114, 108, 111, 231, 107, 32, 72, 248, 108, 109, 101, 163, 46, 10, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 34, 87, 104, 121, 32，110、111、116、32、90、111、105、100、98、101、114、103、63、34、32、113、117、101、114、105、101、100、32、90、111、105、100、98、101、114、103、46、10]）

这是我调用

test\u antiword（'/tmp/shell.txt'）

ine-hte-shell时的相应输出：

（“\n\xe2\x80\x9cI说再见，先生。您好！\xe2\x80\x9d喊道Sh\xc3\xa9rlo\xc3\xa7k H\xc3\xb8lme\xc2\xa3。\n\n\xe2\x80\x9cw为什么不是Zoidberg？\xe2\x80\x9d询问Zoidberg。\n”，[10, 226, 128, 156, 73, 32, 115, 97, 105, 100, 32, 103, 111, 111, 100, 32, 100, 97, 121, 32, 115, 105, 114, 46, 32, 71, 111, 111, 100, 32, 100, 97, 121, 33, 226, 128, 157, 32, 115, 104, 111, 117, 116, 101, 100, 32, 83, 104, 195, 169, 114, 108, 111, 195, 167, 107, 32, 72, 195, 184, 108, 109, 101, 194, 163, 46, 10, 10, 32, 32, 32, 32, 32, 32, 32, 3232、32、32、32、32、226、128、156、87、104、121、32、110、111、116、32、90、111、105、100、98、101、114、103、63、226、128、157、32、113、117、101、114、105、101、100、32、90、111、105、100、98、101、114、103、46、10]）

正如您所看到的，输出是非常不同的。首先，shell输出保持了原始文件中的空白；它在web版本中丢失了

正如您在代码中看到的，我还将文档输出到文件中。生成的输出如下：

web.txt

"I said good day sir. Good day!" shouted Sh?rlo?k H?lme?.

             "Why not Zoidberg?" queried Zoidberg.

“I said good day sir. Good day!” shouted Shérloçk Hølme£.

             “Why not Zoidberg?” queried Zoidberg.

shell.txt

"I said good day sir. Good day!" shouted Sh?rlo?k H?lme?.

             "Why not Zoidberg?" queried Zoidberg.

“I said good day sir. Good day!” shouted Shérloçk Hølme£.

             “Why not Zoidberg?” queried Zoidberg.

在web版本中，字符无法识别，编码由

文件

标识为ISO-8859。在shell版本中，字符显示正确，编码由

文件

标识为UTF-8

我不明白为什么会发生这种情况。我已经检查过，并且两个进程都使用相同版本的antiword。此外，我已经验证了它们都在为

子进程

使用相同的python模块文件。在这两种情况下使用的python版本也完全匹配

有人能解释一下可能发生的情况吗？

这种差异可能是由于环境变量造成的。根据：

Antiword使用环境变量

LC_ALL

、

LC_CTYPE

和

LANG

（按顺序）获取当前区域设置，并使用此信息选择默认映射文件

我猜想，当您从shell运行它时，您的shell处于UTF-8区域设置中，但当您从Django运行它时，它处于不同的区域设置中，并且无法正确转换Unicode字符。请尝试在运行子流程时切换为UTF-8区域设置，如下所示：

new_env = dict(os.environ)  # Copy current environment
new_env['LANG'] = 'en_US.UTF-8'
p = subprocess.Popen(..., env=new_env)

此外，出于本例的目的，我使用了一个短文件，但在较长的文件中，换行符也不同。由shell生成的文件宽150个字符，而由web生成的文件宽80个字符。此外，它似乎是antiword独有的。我还使用

cat

进行了尝试，web和shell的行为相同。T就是这样！请注意，出于某种原因，我在使用

new_env=os.environ[：]

时得到了错误

TypeError:unhabable type

。相反，我使用了

new_env=dict（**os.environ）

，这就成功了。@Jordan:哎呀，我忘了你不能用切片操作符复制

dict

。这种方法也行得通。