Python 使用多重处理将方法并行应用于对象列表_Python_Multiprocessing

Python 使用多重处理将方法并行应用于对象列表

python

Python 使用多重处理将方法并行应用于对象列表,python,multiprocessing,Python,Multiprocessing,我创建了一个包含许多方法的类。其中一种方法非常耗时，my_process，我想并行使用该方法。我遇到过这样的问题，但我不确定如何将其应用到我的问题中，以及它将对我班上的其他方法产生什么影响 class MyClass(): def __init__(self, input): self.input = input self.result = int def my_process(self, multiply_by, add_to):

我创建了一个包含许多方法的类。其中一种方法非常耗时，

my_process

，我想并行使用该方法。我遇到过这样的问题，但我不确定如何将其应用到我的问题中，以及它将对我班上的其他方法产生什么影响

class MyClass():
    def __init__(self, input):
        self.input = input
        self.result = int

    def my_process(self, multiply_by, add_to):
        self.result = self.input * multiply_by
        self._my_sub_process(add_to)
        return self.result

    def _my_sub_process(self, add_to):
        self.result += add_to

list_of_numbers = range(0, 5)
list_of_objects = [MyClass(i) for i in list_of_numbers]
list_of_results = [obj.my_process(100, 1) for obj in list_of_objects] # multi-process this for-loop

print list_of_numbers
print list_of_results

[0, 1, 2, 3, 4]
[1, 101, 201, 301, 401]

如果你的班级不是“庞大的”，我认为以过程为导向更好。建议在多处理中使用池。
这是教程->

然后将

add_to

与

my_进程

分开，因为它们很快，您可以等待直到最后一个进程结束

def my_process(input, multiby):
    return xxxx
def add_to(result,a_list):
    xxx
p = Pool(5)
res = []
for i in range(10):
    res.append(p.apply_async(my_process, (i,5)))
p.join()  # wait for the end of the last process
for i in range(10):
    print res[i].get()

如果您不一定需要坚持使用多处理模块，它可以很容易地通过使用库来实现

下面是示例代码：

from concurrent.futures.thread import ThreadPoolExecutor, wait

MAX_WORKERS = 20

class MyClass():
    def __init__(self, input):
        self.input = input
        self.result = int

    def my_process(self, multiply_by, add_to):
        self.result = self.input * multiply_by
        self._my_sub_process(add_to)
        return self.result

    def _my_sub_process(self, add_to):
        self.result += add_to

list_of_numbers = range(0, 5)
list_of_objects = [MyClass(i) for i in list_of_numbers]

With ThreadPoolExecutor(MAX_WORKERS) as executor:
    for obj in list_of_objects:
        executor.submit(obj.my_process, 100, 1).add_done_callback(on_finish)

def on_finish(future):
    result = future.result() # do stuff with your result

这里executor为它提交的每个任务返回future。请记住，如果使用

add_done_callback（）

finished task from thread returns to main thread（这将阻止main thread），如果确实需要真正的并行性，那么应该单独等待将来的对象。下面是这方面的代码片段

futures = []
with ThreadPoolExecutor(MAX_WORKERS) as executor:
    for objin list_of_objects:
        futures.append(executor.submit(obj.my_process, 100, 1))
wait(futures)

for succeded, failed in futures:
    # work with your result here
    if succeded:
       print (succeeeded.result())
    if failed:
        print (failed.result())

希望这有帮助。

通常并行运行相同计算的最简单方法是

多处理.Pool

的

map

方法（或Python 3中

concurrent.futures

的

as_completed

函数）

但是，

map

方法应用一个函数，该函数使用多个进程对一个数据表只接受一个参数

所以这个函数不能是一个普通的方法，因为它至少需要两个参数；它还必须包括

self

！然而，这可能是一种静态方法。另请参阅，以获得更深入的解释。

我将反对这里的观点，并建议坚持可能有效的最简单的方法；-）也就是说，

Pool.map（）

-类函数非常适合于此，但仅限于传递单个参数。与其英勇地绕过它，不如简单地编写一个只需要一个参数的助手函数：一个元组。那么一切都简单明了了

下面是一个采用这种方法的完整程序，它在Python 2下打印您想要的内容，而不考虑操作系统：

class MyClass():
    def __init__(self, input):
        self.input = input
        self.result = int

    def my_process(self, multiply_by, add_to):
        self.result = self.input * multiply_by
        self._my_sub_process(add_to)
        return self.result

    def _my_sub_process(self, add_to):
        self.result += add_to

import multiprocessing as mp
NUM_CORE = 4  # set to the number of cores you want to use

def worker(arg):
    obj, m, a = arg
    return obj.my_process(m, a)

if __name__ == "__main__":
    list_of_numbers = range(0, 5)
    list_of_objects = [MyClass(i) for i in list_of_numbers]

    pool = mp.Pool(NUM_CORE)
    list_of_results = pool.map(worker, ((obj, 100, 1) for obj in list_of_objects))
    pool.close()
    pool.join()

    print list_of_numbers
    print list_of_results

一个巨大的魔法世界我应该指出，采用我建议的非常简单的方法有很多好处。除此之外，它在python2和python3上“只起作用”，不需要对类进行任何更改，并且易于理解，它还可以与所有

池

方法配合使用

然而，如果有多个方法要并行运行，那么为每个方法编写一个小的辅助函数可能会有点烦人。所以这里有一点“魔法”可以绕过它。更改

worker（）

如下：

def worker(arg):
    obj, methname = arg[:2]
    return getattr(obj, methname)(*arg[2:])

现在，单个辅助函数可以满足任意数量的方法和任意数量的参数。在您的特定情况下，只需更改一行以匹配：

list_of_results = pool.map(worker, ((obj, "my_process", 100, 1) for obj in list_of_objects))

或多或少明显的泛化也可以迎合带有关键字参数的方法。但是，在现实生活中，我通常坚持原来的建议。在某种程度上，迎合一般化的做法弊大于利。再说一次，我喜欢显而易见的东西；-）

根据的答案和您的代码：

将

MyClass对象

添加到

simulation对象

class simulation(multiprocessing.Process):
    def __init__(self, id, worker, *args, **kwargs):
        # must call this before anything else
        multiprocessing.Process.__init__(self)
        self.id = id
        self.worker = worker
        self.args = args
        self.kwargs = kwargs
        sys.stdout.write('[%d] created\n' % (self.id))

在

run

函数中运行所需内容

    def run(self):
        sys.stdout.write('[%d] running ...  process id: %s\n' % (self.id, os.getpid()))
        self.worker.my_process(*self.args, **self.kwargs)
        sys.stdout.write('[%d] completed\n' % (self.id))

试试这个：

list_of_numbers = range(0, 5)
list_of_objects = [MyClass(i) for i in list_of_numbers]
list_of_sim = [simulation(id=k, worker=obj, multiply_by=100*k, add_to=10*k) \
    for k, obj in enumerate(list_of_objects)]  

for sim in list_of_sim:
    sim.start()

请记住，在Python的标准实现中，一次只能有一个线程执行Python字节码。在此实现上，线程不会提高CPU绑定的性能。对于CPU绑定的任务，只需将

ThreadPoolExecutor

交换为

ProcessPoolExecutor

。当流程启动时，您会受到一点影响，但在这之后，工作人员可以同时执行。请注意，从子进程返回的数据需要是可pickle的。如果未设置NUM_CORE，它不会使用多少可用的核心吗？当然。这取决于你。然而，对于CPU受限的任务，通常要求的内核比实际存在的要少，因此操作系统也会获得一些周期来运行其他东西。但是，再一次，这取决于你

mp.cpu\u count（）

返回存在的内核数。