使用pool.map\u async在python中进行多处理_Python_Python Multiprocessing

使用pool.map\u async在python中进行多处理

python

使用pool.map\u async在python中进行多处理,python,python-multiprocessing,Python,Python Multiprocessing,嗨，我觉得我还没有完全正确地理解python中的多处理我想并行运行一个名为“run_worker”（这只是运行和管理子流程的代码）的函数20次，然后等待所有函数完成。每个run_工作线程应在单独的核心/线程上运行。我不在乎进程完成的顺序，因此我使用异步，没有返回值，所以我使用map 我认为我应该使用： if __name__ == "__main__": num_workers = 20 param_map = [] for i in range(n

嗨，我觉得我还没有完全正确地理解python中的多处理

我想并行运行一个名为“run_worker”（这只是运行和管理子流程的代码）的函数20次，然后等待所有函数完成。每个run_工作线程应在单独的核心/线程上运行。我不在乎进程完成的顺序，因此我使用异步，没有返回值，所以我使用map

我认为我应该使用：

if __name__ == "__main__":
    num_workers = 20
    param_map = []
    for i in range(num_workers):
        param_map += [experiment_id]
        
    pool = mp.Pool(processes= num_workers)
    pool.map_async(run_worker, param_map)
    
    pool.close()
    pool.join()

However this code exits straight away and doesn't appear to execute run_worker properly. Also do I really have to create a param_map of the same experiment_id to pass to the worker because this seems like a hack to get the number of run_workers created. Ideally i would like to run a function with no parameters and no return value over multiple cores.

注意：我在AWS中使用的是windows 2019 server

编辑添加的run_worker，该worker调用写入文件的子进程：

def run_worker(experiment_id):
    hostname = socket.gethostname()
    experiment = conn.experiments(experiment_id).fetch()

    while experiment.progress.observation_count < experiment.observation_budget:
        suggestion = conn.experiments(experiment.id).suggestions().create()
        value = evaluate_model(suggestion.assignments)
        conn.experiments(experiment_id).observations().create(suggestion=suggestion.id,value=value,metadata=dict(hostname=hostname),)
        # Update the experiment object
        experiment = conn.experiments(experiment_id).fetch()

def运行工人（实验id）：
hostname=socket.gethostname（）
experience=conn.experiments（实验id）.fetch（）
当实验.progress.observation\u count<实验.observation\u预算：
suggestion=conn.experiments（experiment.id）.suggestions（）.create（）
价值=评估模型（建议.任务）
conn.experiments（experiment_id）.observations（）.create（suggestion=suggestion.id，value=value，metadata=dict（hostname=hostname），）
#更新实验对象
experience=conn.experiments（实验id）.fetch（）

似乎出于这个简单的目的，您可以更好地使用

pool.map

而不是

pool.map\u async

。它们都并行运行，但是在所有操作完成之前，

pool.map

一直处于阻塞状态（另请参见）

pool.map\u async

特别适用于以下情况：

result=map\u async（func，iterable）
而不是结果。就绪（）
//在map_async运行时执行一些工作
通过
//阻止调用以获取结果
out=result.get（）

关于参数的问题，映射操作的基本思想是将一个list/array/iterable的值映射到相同大小的新值列表。据我在中看到的，

多处理

不提供任何无参数运行多个函数的方法

如果您也愿意分享您的

run\u worker

功能，这可能有助于更好地回答您的问题。这也可能解释了为什么您首先要运行一个没有任何参数的函数，并使用

map

操作返回值。

用这个函数更新了代码，简而言之：do

res=pool.map\u async（run\u worker，param\u map）

然后

res.get（）

或者只使用

pool.map（run\u worker，param\u map）

。这两种方法都可以，请让我们知道这是否适合你。此外，您的run_worker函数可能仍然存在问题。这不是你提供的代码的免赔额。因此，它在WIndows上不起作用。我怀疑这里还有其他问题，即使按照MultiProcessor IsIntructions的if main语句也是如此。run_worker工作正常，但这只生成此函数的一个版本。请注意，对

run_worker

的单个调用或多个连续调用按预期工作，这并不保证它将并行工作。您的功能似乎依赖于共享资源（

socket

和

conn

）。您确定所有操作都是线程安全的吗？是的，事实上，这段代码是为并行操作构建的优化器，在linux环境中运行良好，但在windows上似乎存在问题。Atm我能想到的最好的解决方案是在condor网格上运行它，让它管理作业，而不是多处理器。