Python 主流程和子流程中的Popen_Python

Python 主流程和子流程中的Popen

python

Python 主流程和子流程中的Popen,python,Python,以下代码（在主线程中）运行良好，我对一些文件进行grep搜索，直到找到前100个结果（将结果写入文件），然后退出： command = 'grep -F "%s" %s*.txt' % (search_string, DATA_PATH) p = Popen(['/bin/bash', '-c', command], stdout = PIPE) f = open(output_file, 'w+') num_lines = MAX_RESULTS wh

以下代码（在主线程中）运行良好，我对一些文件进行grep搜索，直到找到前100个结果（将结果写入文件），然后退出：

    command = 'grep -F "%s" %s*.txt' % (search_string, DATA_PATH)

    p = Popen(['/bin/bash', '-c', command], stdout = PIPE)
    f = open(output_file, 'w+')
    num_lines = MAX_RESULTS
    while True:  
        line = p.stdout.readline()
        print num_lines
        if line != '':
            f.write(line)
        num_lines = num_lines - 1
        if num_lines == 0:
            break
        else:
            break

与流程子类中使用的代码相同，总是在控制台中返回

grep:writing output:break pipe

：

    class Search(Process):
        def __init__(self, search_id, search_string):
            self.search_id = search_id
            self.search_string = search_string  
            self.grepped = ''
            Process.__init__(self)

        def run(self):
            output_file = TMP_PATH + self.search_id

            # flag if no regex chars
            flag = '-F' if re.match(r"^[a-zA-Z0\ ]*$", self.search_string) else '-P'    

            command = 'grep %s "%s" %s*.txt' % (flag, self.search_string, DATA_PATH)

            p = Popen(['/bin/bash', '-c', command], stdout = PIPE)
            f = open(output_file, 'w+')
            num_lines = MAX_RESULTS
            while True:  
                line = p.stdout.readline()
                print num_lines
                if line != '':
                    f.write(line)
                num_lines = num_lines - 1
                if num_lines == 0:
                    break
                else:
                    break

为什么？如何修复此问题？

我可以像这样重现错误消息：

import multiprocessing as mp
import subprocess
import shlex

def worker():
    proc = subprocess.Popen(shlex.split('''
        /bin/bash -c "grep -P 'foo' /tmp/test.txt"
        '''), stdout = subprocess.PIPE)
    line = proc.stdout.readline()
    print(line)
    # proc.terminate()   # This fixes the problem

if __name__=='__main__':
    N = 6000
    with open('/tmp/test.txt', 'w') as f:
        f.write('bar foo\n'*N)   # <--- Increasing this number causes grep: writing output: Broken pipe
    p = mp.Process(target = worker)
    p.start()
    p.join()

对于stderr，它仍在处理的每行一次

修复方法是使用

proc.terminate（）终止进程

在

worker

结束之前。

为什么要使用grep，而Python本身有非常可靠的解决方案？因为我必须搜索1.5 Gb以上的数据，而且grep的速度是Python无法比拟的。看起来和这里的问题一样：当我在捕获命令输出方面遇到问题时，我已经增加了缓冲区的大小：

p=Popen（['/bin/bash'，'-c'，command]，stdout=PIPE，bufsize=256*1024*1024）

如果我从grep本身读取了一个无限的while循环，为什么我的进程会在grep之前结束呢？你的while循环在一次迭代后会中断。

grep: writing output: Broken pipe