为什么不在Python的subprocess.Popen中使用'shell=True'？_Python_Linux_Shell_Process_Subprocess

为什么不在Python的subprocess.Popen中使用'shell=True'？

python linux shell process

为什么不在Python的subprocess.Popen中使用'shell=True'？,python,linux,shell,process,subprocess,Python,Linux,Shell,Process,Subprocess,我有一个很长的一行shell命令供Python调用。代码如下所示： # "first way" def run_cmd ( command ): print "Run: %s" % command subprocess.call (command, shell=True) run_cmd('''sort -n -r -k5 {3} |head -n 500|awk 'OFS="\t"{{if($2-{1}>0){{print $1,$2-{1},$3+{1},$4,$5}}

我有一个很长的一行shell命令供Python调用。代码如下所示：

# "first way"
def run_cmd ( command ):
    print "Run: %s" % command
    subprocess.call (command, shell=True)
run_cmd('''sort -n -r -k5 {3} |head -n 500|awk 'OFS="\t"{{if($2-{1}>0){{print $1,$2-{1},$3+{1},$4,$5}}}}' > {2}'''.format(top_count,extend/2,mid,summit))

sort: write failed: standard output: Broken pipe
sort: write error
awk: (FILENAME=- FNR=132) fatal: print to "standard output" failed (Broken pipe)

这些代码有效，但它总是这样抱怨：

# "first way"
def run_cmd ( command ):
    print "Run: %s" % command
    subprocess.call (command, shell=True)
run_cmd('''sort -n -r -k5 {3} |head -n 500|awk 'OFS="\t"{{if($2-{1}>0){{print $1,$2-{1},$3+{1},$4,$5}}}}' > {2}'''.format(top_count,extend/2,mid,summit))

sort: write failed: standard output: Broken pipe
sort: write error
awk: (FILENAME=- FNR=132) fatal: print to "standard output" failed (Broken pipe)

根据，我需要使用更长的脚本来完成此任务，如：

# "second way"
p1 = Popen("sort -n -r -k5 %s"%summit, stdout=PIPE)
p2 = Popen("head -n 500", stdin=p1.stdout, stdout=PIPE)
# and so on ..........

我的问题是：

（1） “第二条路”是否比“第一条路”慢

（2）如果我必须以“第一种方式”写作（因为写作速度更快），我如何避免像管道破裂这样的抱怨

（3）如果您的输入数据来自不受信任的来源，我不应该使用

shell=True

以“第一种方式”编写的最有说服力的原因可能是安全风险。例如，如果

mid

变量的内容是

“/dev/null；rm-rf/”

，该怎么办。在您的场景中，情况似乎并非如此，因此我不会对此太担心

在代码中，您将

awk

的结果直接写入

mid

中的文件名。要调试该问题，您可能需要使用

子流程。检查输出

，并从python程序中的

awk

调用读取结果

cmd = """sort -n -r -k5 %s |
      head -n 500|
      awk 'OFS="\t"{{if($2-{1}>0){{print $1,$2-{1},$3+{1},$4,$5}}}}'""".format(summit, top_count)

subprocess.check_call(cmd, shell=True, stdout=file)

它不太可能再慢了，但是您可以始终使用来测试它，以确保它是正确的。有两个很好的理由不采用第一种方式。第一个是，虽然第一次打字可能会稍微快一点，但可读性会大大降低，而且。第二个是，使用

shell=True

是一种错误，应作为原则避免使用

（1） “第二条路”是否比“第一条路”慢

启动一个新进程是一项昂贵的操作，因此允许shell解析命令行和启动子进程与自己在Python中完成这项操作之间应该没有太大区别。唯一重要的基准是硬件上的代码。测量一下

（2）如果我不得不用“第一种方式”写作（因为它写得更快），我怎么能避免像管道破裂这样的抱怨呢

第一个“断管”可能类似于：。试试看

通过将管道标准输出重定向到

mid

文件，可以修复第二个损坏的管道：

with open(mid, 'wb') as file:
    check_call(pipeline, shell=True, stdout=file)

它在没有shell的情况下在命令中实现

{2}

（3）我不应该以“第一种方式”写作的最令人信服的理由是什么

如果

top\u count

、

extend

、

mid

、

summit

中的任何一个源不完全在您的控制之下，则您有可能在用户的控制下运行任意命令

plumbum

模块提供了安全性和可读性（如果在这种情况下对您很重要，请测量时间性能）：

请看，

第一条路几乎不可读。因此，这是不使用它的一个很好的理由。@ebarr由于awk部分的原因，它是不可读的……顺便说一句，在第二种方法中，可能需要关闭主程序不再使用的所有中间文件描述符：

p1.stdout.close（）

在创建

p2

之后，依此类推。这确保了进程1获得由进程2创建的EOF条件。（1）如果不使用

format（）

，则应在

cmd

中取消scape

，并添加

指令和变量。否则命令将被中断。（2）

行

是单个字符，而不是代码中的行。

检查输出（）

返回单个字符串。您不希望一次迭代一个字符。若要将子进程的stdout重定向到文件：

检查调用（cmd，shell=True，stdout=file）