Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/arduino/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何跟踪python多处理池并在每次X迭代后运行函数?_Python_Multiprocessing - Fatal编程技术网

如何跟踪python多处理池并在每次X迭代后运行函数?

如何跟踪python多处理池并在每次X迭代后运行函数?,python,multiprocessing,Python,Multiprocessing,我有一些简单的python多处理代码,如下所示: files = ['a.txt', 'b.txt', 'c.txt', etc..] def convert_file(file): do_something(file) mypool = Pool(number_of_workers) mypool.map(convert_file, files) 我有10万个文件要通过convert_文件进行转换,我想运行一个功能,在这个功能中,我将每20个转换的文件上传到服务器,而不必等待所有文

我有一些简单的python多处理代码,如下所示:

files = ['a.txt', 'b.txt', 'c.txt', etc..]

def convert_file(file):
  do_something(file) 

mypool = Pool(number_of_workers)
mypool.map(convert_file, files)

我有10万个文件要通过convert_文件进行转换,我想运行一个功能,在这个功能中,我将每20个转换的文件上传到服务器,而不必等待所有文件被转换。我该怎么做呢?

您可以在整个流程中使用一个共享变量来跟踪转换的文件。你可以找到一个例子


当进程想要读写时,变量会自动锁定。在锁定期间,要访问变量的所有其他进程都必须等待。因此,您可以在主循环中轮询变量并检查它是否大于20,同时转换过程会不断增加变量。一旦该值超过20,您将重置该值并将文件写入服务器。

对于多处理,您在如何处理单个作业中发生的异常方面会遇到一些小问题。如果您使用map变量,那么您需要注意如何轮询结果,否则,如果map函数被迫引发异常,您可能会丢失一些结果。此外,除非您对工作中的任何异常进行了特殊处理,否则您甚至不知道哪个工作是问题所在。如果您使用apply变量,那么在获取结果时就不需要太小心,但是整理结果会变得有点棘手

总的来说,我认为map是最容易工作的

首先,您需要一个特殊的异常,它不能在主模块中创建,否则Python将无法正确地序列化和反序列化它

例如

自定义_.py

main.py


您是否担心do_某事引发异常的可能性?如果是这样的话,那么你需要更加小心处理。@Dunes你能进一步澄清一下吗?我不期望出现异常,但这完全有可能。我查看了您提供的示例,并在我的测试示例中尝试了一些版本,但仍然不清楚如何做到这一点。我收到错误:UnboundLocalError:分配前引用的局部变量“XXX”谢谢,我尝试合并您的代码,并尝试在14个文件上使用6的块大小。我看了两次完整的调度,但之后什么都没有。。。僵尸进程?您的convert_file函数可能正在抛出实例BaseException,这可能会导致池挂起。尝试捕获作业中的BaseException而不是Exception,看看会发生什么。
class FailedJob(Exception):
    pass
from multiprocessing import Pool
import time
import random

from custom_exceptions import FailedJob


def convert_file(filename):
    # pseudo implementation to demonstrate what might happen
    if filename == 'file2.txt':
        time.sleep(0.5)
        raise Exception
    elif filename =='file0.txt':
        time.sleep(0.3)
    else:
        time.sleep(random.random())
    return filename  # return filename, so we can identify the job that was completed


def job(filename):
    """Wraps any exception that occurs with FailedJob so we can identify which job failed 
    and why""" 
    try:
        return convert_file(filename)
    except Exception as ex:
        raise FailedJob(filename) from ex


def main():
    chunksize = 4  # number of jobs before dispatch
    total_jobs = 20
    files = list('file{}.txt'.format(i) for i in range(total_jobs))

    with Pool() as pool:
        # we use imap_unordered as we don't care about order, we want the result of the 
        # jobs as soon as they are done
        iter_ = pool.imap_unordered(job, files)
        while True:
            completed = []
            while len(completed) < chunksize:
                # collect results from iterator until we reach the dispatch threshold
                # or until all jobs have been completed
                try:
                    result = next(iter_)
                except StopIteration:
                    print('all child jobs completed')
                    # only break out of inner loop, might still be some completed
                    # jobs to dispatch
                    break
                except FailedJob as ex:
                    print('processing of {} job failed'.format(ex.args[0]))
                else:
                    completed.append(result)

            if completed:
                print('completed:', completed)
                # put your dispatch logic here

            if len(completed) < chunksize:
                print('all jobs completed and all job completion notifications'
                   ' dispatched to central server')
                return


if __name__ == '__main__':
    main()