如何使用Python将stdin/stdout传输到Perl脚本

如何使用Python将stdin/stdout传输到Perl脚本,python,pipe,Python,Pipe,这段Python代码通过Perl脚本传输数据 import subprocess kw = {} kw['executable'] = None kw['shell'] = True kw['stdin'] = None kw['stdout'] = subprocess.PIPE kw['stderr'] = subprocess.PIPE args = ' '.join(['/usr/bin/perl','-w','/path/script.perl','<','/path/mydat

这段Python代码通过Perl脚本传输数据

import subprocess
kw = {}
kw['executable'] = None
kw['shell'] = True
kw['stdin'] = None
kw['stdout'] = subprocess.PIPE
kw['stderr'] = subprocess.PIPE
args = ' '.join(['/usr/bin/perl','-w','/path/script.perl','<','/path/mydata'])
subproc = subprocess.Popen(args,**kw)
for line in iter(subproc.stdout.readline, ''):
    print line.rstrip().decode('UTF-8')
在将第一行发送到子流程后,代码将与readline挂起。我还有其他的可执行文件完美地使用了完全相同的代码


我的数据文件可能相当大(1.5 GB),有没有办法不保存到文件就完成数据管道化?为了与其他系统兼容,我不想重新编写perl脚本。

请参阅手册中提到的有关使用
Popen.stdin
Popen.stdout
的警告(就在上面):

警告:使用
通信()
而不是
.stdin.write
.stdout.read
.stderr.read
避免由于任何其他操作系统管道缓冲区填满并阻塞子进程而导致死锁

我意识到一次在内存中存储一个千兆字节半的字符串不是很理想,但使用是一种可行的方法,正如您所观察到的,一旦操作系统管道缓冲区填满,
stdin.write()
+
stdout.read()
方法就会陷入死锁


使用
communicate()
对您是否可行?

您的代码在线路上阻塞:

for line in iter(subproc.stdout.readline, ''):
因为此迭代可以终止的唯一方式是当达到EOF(文件结束)时,这将在子流程终止时发生。您不希望等待进程终止,但是,您只希望等待进程完成对发送给它的行的处理

此外,正如Chris Morgan所指出的,您还遇到了缓冲问题。另一个讨论如何使用子流程执行非阻塞读取。从这个问题到你的问题,我已经快速地修改了代码:

def enqueue_output(out, queue):
    for line in iter(out.readline, ''):
        queue.put(line)
    out.close()

kw = {}
kw['executable'] = '/usr/bin/perl'
kw['shell'] = False
kw['stderr'] = subprocess.PIPE
kw['stdin'] = subprocess.PIPE
kw['stdout'] = subprocess.PIPE
args = ['-w','/path/script.perl',]
subproc = subprocess.Popen(args, **kw)
f = codecs.open('/path/mydata','r','UTF-8')
q = Queue.Queue()
t = threading.Thread(target = enqueue_output, args = (subproc.stdout, q))
t.daemon = True
t.start()
for line in f:
    subproc.stdin.write('%s\n'%(line.strip().encode('UTF-8')))
    print "Sent:", line.strip()  ### code hangs after printing this ###
    try:
        line = q.get_nowait()
    except Queue.Empty:
        pass
    else:
        print "Received:", line.rstrip().decode('UTF-8')

subproc.terminate()
f.close()

您很可能需要修改此代码,但至少它不会阻塞。

谢谢srgerg。我也尝试过线程解决方案。然而,仅此一项解决方案始终悬而未决。我以前的代码和srgerg的代码都缺少最终的解决方案,您的提示给了我最后一个想法

最终解决方案写入足够的虚拟数据,强制从缓冲区中输出最终有效行。为了支持这一点,我添加了跟踪有多少有效行被写入stdin的代码。线程循环打开输出文件,保存数据,并在读取行等于有效输入行时中断。此解决方案确保它逐行读取和写入任何大小的文件

def std_output(stdout,outfile=''):
    out = 0
    f = codecs.open(outfile,'w','UTF-8')
    for line in iter(stdout.readline, ''):
        f.write('%s\n'%(line.rstrip().decode('UTF-8')))
        out += 1
        if i == out: break
    stdout.close()
    f.close()

outfile = '/path/myout'
infile = '/path/mydata'

subproc = subprocess.Popen(args,**kw)
t = threading.Thread(target=std_output,args=[subproc.stdout,outfile])
t.daemon = True
t.start()

i = 0
f = codecs.open(infile,'r','UTF-8')
for line in f:
    subproc.stdin.write('%s\n'%(line.strip().encode('UTF-8')))
    i += 1
subproc.stdin.write('%s\n'%(' '*4096)) ### push dummy data ###
f.close()
t.join()
subproc.terminate()
def std_output(stdout,outfile=''):
    out = 0
    f = codecs.open(outfile,'w','UTF-8')
    for line in iter(stdout.readline, ''):
        f.write('%s\n'%(line.rstrip().decode('UTF-8')))
        out += 1
        if i == out: break
    stdout.close()
    f.close()

outfile = '/path/myout'
infile = '/path/mydata'

subproc = subprocess.Popen(args,**kw)
t = threading.Thread(target=std_output,args=[subproc.stdout,outfile])
t.daemon = True
t.start()

i = 0
f = codecs.open(infile,'r','UTF-8')
for line in f:
    subproc.stdin.write('%s\n'%(line.strip().encode('UTF-8')))
    i += 1
subproc.stdin.write('%s\n'%(' '*4096)) ### push dummy data ###
f.close()
t.join()
subproc.terminate()