将字符串作为子进程传递'；在Python中通过多个命名管道输入_Python_Subprocess_Diff_Named Pipes

将字符串作为子进程传递'；在Python中通过多个命名管道输入

python

将字符串作为子进程传递'；在Python中通过多个命名管道输入,python,subprocess,diff,named-pipes,Python,Subprocess,Diff,Named Pipes,我花了相当长的时间试图让Linux diff和补丁工具在python中使用字符串。为了实现这一点，我尝试使用命名管道，因为它们似乎是最健壮的方式。问题是，这不适用于大文件例如： a, b = str1, str2 # ~1MB each string fname1, fname2 = mkfifos(2) proc = subprocess.Popen(['diff', fname1, fname2], \ stdout=subprocess

我花了相当长的时间试图让Linux diff和补丁工具在python中使用字符串。为了实现这一点，我尝试使用命名管道，因为它们似乎是最健壮的方式。问题是，这不适用于大文件

例如：

a, b = str1, str2 # ~1MB each string

fname1, fname2 = mkfifos(2)
proc = subprocess.Popen(['diff', fname1, fname2], \
                         stdout=subprocess.PIPE, stderr=subprocess.PIPE)

print('Writing first file.')
with open(fname1, 'w') as f1:
    f1.write(a)
print('Writing second file.')
with open(fname2, 'w') as f2:
    f2.write(b)

这在第一次写入时挂起。如果我发现如果我使用

a[：6500]

它将在第二次写入时挂起。所以我认为这和缓冲区有关。我尝试在每次写入、关闭后使用低级别的

os.open（f，'r'，0）

和0缓冲区手动刷新，但仍然存在相同的问题

我曾想过在write-in-chunk中循环，但在Python这样的高级语言中，这感觉是错误的。知道我做错了什么吗？

命名管道仍然是管道。它在Linux上有一个有限的缓冲区。除非有人同时从管道的另一端读取数据，否则无法写入无限制的输出

如果

f1.write（a）

阻塞，则意味着

diff

不会一次读取所有输入文件（这似乎合乎逻辑：diff程序的目的是逐行比较文件——读取第一个文件不会比读取第二个文件早太多）

要同时将不同的数据写入不同的位置，可以使用threads/async.io：

#!/usr/bin/env python3
from subprocess import Popen, PIPE
from threading import Thread

def write_input_async(path, text):
    def writelines():
        with open(path, 'w') as file:
            for line in text.splitlines(keepends=True):
                file.write(line)
    Thread(target=writelines, daemon=True).start()

with named_pipes(2) as paths, \
    Popen(['diff'] + paths, stdout=PIPE,stderr=PIPE, universal_newlines=True) as p:
    for path, text in zip(paths, [a, b]):
        write_input_async(path, text)
    output, errors = p.communicate()

在哪里

注意：除非您调用

.communicate（）

；

diff

进程可能会在其任何stdout/stderr操作系统管道缓冲区填满后立即挂起

可以。

命名管道仍然是管道。它在Linux上有一个有限的缓冲区。除非有人同时从管道的另一端读取数据，否则无法写入无限制的输出

如果

f1.write（a）

阻塞，则意味着

diff

不会一次读取所有输入文件（这似乎合乎逻辑：diff程序的目的是逐行比较文件——读取第一个文件不会比读取第二个文件早太多）

要同时将不同的数据写入不同的位置，可以使用threads/async.io：

#!/usr/bin/env python3
from subprocess import Popen, PIPE
from threading import Thread

def write_input_async(path, text):
    def writelines():
        with open(path, 'w') as file:
            for line in text.splitlines(keepends=True):
                file.write(line)
    Thread(target=writelines, daemon=True).start()

with named_pipes(2) as paths, \
    Popen(['diff'] + paths, stdout=PIPE,stderr=PIPE, universal_newlines=True) as p:
    for path, text in zip(paths, [a, b]):
        write_input_async(path, text)
    output, errors = p.communicate()

在哪里

注意：除非您调用

.communicate（）

；

diff

进程可能会在其任何stdout/stderr操作系统管道缓冲区填满后立即挂起

你可以。

如果你先写一个，fifo缓冲区不是会填满吗？还是会逐渐变空？@J.p.Petersen是的，我想这就是正在发生的事情；diff正在逐渐读取这两个文件，因此最终会陷入死锁。如果第一次写入是在一个线程中完成的，则可以正常工作。如果输入字符串

str1

，

str2

来自其他进程；看看这个问题，如果你先写一个，fifo缓冲区不是会填满吗？或者会逐渐变空吗？@J.P.Petersen是的，我想这就是正在发生的事情；diff正在逐渐读取这两个文件，因此最终会陷入死锁。如果第一次写入是在一个线程中完成的，则可以正常工作。如果输入字符串

str1

，

str2

来自其他进程；看一看那个节目中的问题