Python—多处理模块的动态使用_Python_Multiprocessing

Python—多处理模块的动态使用

python

Python—多处理模块的动态使用,python,multiprocessing,Python,Multiprocessing,我正在尝试开发一个动态使用多处理模块的包装器。我在不同的模块中有许多功能需要正确管理。我需要能够将源自任何模块的函数及其参数传递给我的包装器。数据和函数在运行时之前都是未知的，因为它依赖于用户下面是我正在尝试做的一个例子： import sys from multiprocessing import Process, Queue, cpu_count def dyn(): pass class mp(): def __init__(self, data, func, n_p

我正在尝试开发一个动态使用

多处理

模块的包装器。我在不同的模块中有许多功能需要正确管理。我需要能够将源自任何模块的函数及其参数传递给我的包装器。数据和函数在运行时之前都是未知的，因为它依赖于用户

下面是我正在尝试做的一个例子：

import sys
from multiprocessing import Process, Queue, cpu_count

def dyn():
    pass

class mp():
    def __init__(self, data, func, n_procs = cpu_count()):
        self.data    = data
        self.q       = Queue()
        self.n_procs = n_procs

        # Replace module-level 'dyn' function with the provided function
        setattr(sys.modules[__name__], 'dyn', func)
        # Calling dyn(...) at this point will produce the same output as
        # calling func(...)

    def _worker(self, *items):
        data = []
        for item in items:
            data.append(dyn(item))
        self.q.put(data)

    def compute(self):
        for item in self.data:
            Process(target=getattr(self, '_worker'), args=item).start()

    def items(self):
        queue_count = self.n_procs
        while queue_count > 0:
            queue_count -= 1
            yield self.q.get()

if __name__ == '__main__':  
    def work(x):
        return x ** 2

    # Create workers
    data = [range(10)] * cpu_count()
    workers = mp(data, work)

    # Make the workers work
    workers.compute()

    # Get the data from the workers
    workers_data = []
    for item in workers.items():
        workers_data.append(item)
    print workers_data

对于本例，输出应采用以下格式：

[[0, 1, 4, 9, 16, 25, 36, 49, 64, 81] * n_procs]

如果尝试运行此代码，将出现异常，说明传递给

dyn

的参数太多。我认为问题在于，在这个实例中，

dyn

被覆盖，但是当调用

Process

时，更改就不再存在了

我怎样才能避开这个问题

注意-此代码需要在Windows上运行。我正在使用Python 2.7

更新

根据我收到的评论，我决定做一些“混乱”的事情。以下是我的工作方案：

import sys, re, uuid, os
from cStringIO import StringIO
from multiprocessing import Process, Queue, cpu_count

class mp():
    def __init__(self, data, func, n_procs = cpu_count()):
        self.data    = data
        self.q       = Queue()
        self.n_procs = n_procs
        self.module  = 'm' + str(uuid.uuid1()).replace('-', '')
        self.file    = self.module + '.py'

        # Build external module
        self.__func_to_module(func)

    def __func_to_module(self, func):   
        with open(self.file, 'wb') as f:
            for line in StringIO(func):
                if 'def' in line:
                    f.write(re.sub(r'def .*\(', 'def work(', line))
                else:
                    f.write(line)

    def _worker(self, q, module, *items):
        exec('from %s import work' % module)
        data = []
        for item in items[0]:
            data.append(work(item))
        q.put(data)

    def compute(self):
        for item in self.data:
            Process(target=getattr(self, '_worker'),
                args=(self.q, self.module, item)).start()

    def items(self):
        queue_count = self.n_procs
        while queue_count > 0:
            queue_count -= 1
            yield self.q.get()
        os.remove(self.file)
        os.remove(self.file + 'c')

if __name__ == '__main__':  
    func = '''def func(x):
        return x ** 2'''

    # Create workers
    data = [range(10)] * cpu_count()
    workers = mp(data, func)

    # Make the workers work
    workers.compute()

    # Get the data from the workers
    workers_data = []
    for item in workers.items():
        workers_data.append(item)
    print workers_data

在windows上，每个新进程启动时都会重新加载模块，因此dyn的定义将丢失。但是，您可以通过队列传递函数，也可以通过参数传递给进程的目标函数

def _worker(*items, func=None, q=None):  
    #note that this had better be a function not a method
    data = []
    for item in items:
        data.append(func(item))
    q.put(data)

#...
Process(target=_worker, args=item, kwargs={'func':dyn, 'q':q})

我得到的唯一错误是

AttributeError:mp实例没有属性“n\u procs”

。修复后，脚本输出：

[[0,1,4,9,16,25,36,49,64,81]，[0,1,4,9,16,25,36,49,64,81]

@beeta你在Linux上吗？@beetea-谢谢你的关注。我更新了代码，但仍然不能解决我的问题。@dano是的，Ubuntu 12上的Python2.7。04@beeta这是仅限Windows的问题。之所以发生这种情况，是因为Windows没有

os.fork

，需要在子进程中完全重新加载模块，然后通过pickle将内容发送给它。如果您有Python 3.4，那么可以使用

ctx=multiprocessing.get_context（“spawn”）

和

queue=ctx.queue（）

ctx.Process（…

复制它。此外，最好将队列作为参数传递给目标函数，而不是在self中继承它。请提供一个示例？将函数作为参数传递会导致属性错误。不过，通过队列传递的函数仍需要在模块的顶层定义。由于您在

中定义了worker
，如果\uuuu name\uuuu==“\uuuu main\uuuu”：

guard，它将无法从子级访问。@dano-这就是我的想法，这解释了我得到的属性错误。如果您首先定义一个函数，其唯一目的是执行该方法，则可以传递一个方法：

def（obj，meth，*a，**kw）：返回getattr（obj，meth）（obj，*a，**kw）