Multithreading 当管道输出时，Python子进程作为Popen挂起_Multithreading_Python 2.7_Popen

Multithreading 当管道输出时，Python子进程作为Popen挂起

multithreading python-2.7

Multithreading 当管道输出时，Python子进程作为Popen挂起,multithreading,python-2.7,popen,Multithreading,Python 2.7,Popen,我已经阅读了几十篇“Python子流程挂起”的文章，并且认为我已经解决了下面代码中不同文章中提出的所有问题我的代码在Popen命令处间歇性挂起。我使用multiprocessing.dummy.apply_async运行4个线程，每个线程启动一个子进程，然后逐行读取输出并将其修改版本打印到标准输出 def my_subproc(): exec_command = ['stdbuf', '-i0', '-o0', '-e0', sys.executa

我已经阅读了几十篇“Python子流程挂起”的文章，并且认为我已经解决了下面代码中不同文章中提出的所有问题

我的代码在Popen命令处间歇性挂起。我使用multiprocessing.dummy.apply_async运行4个线程，每个线程启动一个子进程，然后逐行读取输出并将其修改版本打印到标准输出

def my_subproc():
   exec_command = ['stdbuf', '-i0', '-o0', '-e0',
                    sys.executable, '-u',
                    os.path.dirname(os.path.realpath(__file__)) + '/myscript.py']

   proc = subprocess.Popen(exec_command, env=env, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, bufsize=1)
   print "DEBUG1", device

   for line in iter(proc.stdout.readline, b''):
    with print_lock:
        for l in textwrap.wrap(line.rstrip(), LINE_WRAP_DEFAULT):

上面的代码是从apply_async运行的：

pool = multiprocessing.dummy.Pool(4)
for i in range(0,4):
    pool.apply_async(my_subproc)

子进程将间歇挂起在

subprocess.Popen

，语句“DEBUG1”不会打印。有时所有线程都可以工作，有时4个线程中只有1个可以工作

我不知道这显示了Popen的任何已知死锁情况。我错了吗？

这似乎是与multiprocessing.dummy的不良交互。当我使用多处理（不是.dummy threading接口）时，我无法重现错误。

subprocess.Popen（）中有一个潜在的错误，它是由stdout（可能是stderr）的io缓冲引起的。子进程io缓冲区中有大约65536个字符的限制。如果子进程写入足够的输出，子进程将“挂起”等待刷新缓冲区，这是一种死锁情况。subprocess.py的作者似乎认为这是由子进程引起的问题，尽管subprocess.flush是受欢迎的。Pearson Anders Pearson，有一个简单的解决方案，但你必须注意。正如他所说，“tempfile.TemporaryFile（）是您的朋友。”在我的例子中，我在循环中运行一个应用程序来批处理一组文件，解决方案的代码是：

with tempfile.TemporaryFile() as fout:
     sp.run(['gmat', '-m', '-ns', '-x', '-r', str(gmat_args)], \
            timeout=cpto, check=True, stdout=fout, stderr=fout)

在处理了大约20个文件后，上面的修复仍然死锁。这是一个改进，但还不够好，因为我需要批量处理数百个文件。我想出了下面的“撬棍”方法

                proc = sp.Popen(['gmat', '-m', '-ns', '-x', '-r', str(gmat_args)], stdout=sp.PIPE, stderr=sp.STDOUT)   
                """ Run GMAT for each file in batch.
                    Arguments:
                    -m: Start GMAT with a minimized interface.
                    -ns: Start GMAT without the splash screen showing.
                    -x: Exit GMAT after running the specified script.
                    -r: Automatically run the specified script after loading.
                Note: The buffer passed to Popen() defaults to io.DEFAULT_BUFFER_SIZE, usually 62526 bytes.
                If this is exceeded, the child process hangs with write pending for the buffer to be read.
                https://thraxil.org/users/anders/posts/2008/03/13/Subprocess-Hanging-PIPE-is-your-enemy/
                """
                try:
                    (outs, errors) = proc.communicate(cpto)
                    """Timeout in cpto seconds if process does not complete."""

                except sp.TimeoutExpired as e:
                    logging.error('GMAT timed out in child process. Time allowed was %s secs, continuing', str(cpto))

                    logging.info("Process %s being terminated.", str(proc.pid))
                    proc.kill()
                    """ The child process is not killed by the system. """

                    (outs, errors) = proc.communicate()
                    """ And the stdout buffer must be flushed. """

基本思想是在每次超时时终止进程并刷新缓冲区。我将TimeoutExpired异常移动到批处理循环中，以便在终止进程后，继续执行下一个进程。如果超时值足以允许gmat完成（尽管速度较慢），则这是无害的。我发现代码将在超时之前处理3到20个文件

这看起来像是子流程中的一个bug。

现在multiprocessing.py有一个完全不同的接口，更像threading.py。因此，我不确定如何在这里进行比较。我同意使用多处理可以加速整个会话。我不必探究mp的“本质”，所以你是说mp也有同样的管道死锁问题吗？我看到你的示例在多处理池接口中使用了Popen。但是你的Popen呼叫仍然使用标准管道。你们有一个更复杂的例子，但我正朝着这个例子走去，所以看到你们的例子非常有用，非常令人失望的是你们已经陷入僵局。注意，bufsize=1表示缓存行。因此不能在一个communicate（）中清空标准输出管道。