Python：从类中多处理作为类的数据成员的函数的有效解决方法_Python_Class_Multiprocessing

Python：从类中多处理作为类的数据成员的函数的有效解决方法

python class

Python：从类中多处理作为类的数据成员的函数的有效解决方法,python,class,multiprocessing,Python,Class,Multiprocessing,在处理作为类的数据成员的函数时（由于酸洗问题），我知道多处理模块但是，是否有另一个模块，或者多处理中的任何一种工作方式，允许类似以下内容（特别是在不强制并行应用函数定义的情况下）存在于类之外注意：我可以通过将my_single_function移动到类外，并将类似foo.my_args的内容传递给map或map_async命令来轻松完成这项任务。但这会将函数的并行执行推到MyClass实例之外对于我的应用程序（并行化大型数据查询，该查询检索、连接和清理每月的数据横截面，然后将它们附加到此类

在处理作为类的数据成员的函数时（由于酸洗问题），我知道多处理模块

但是，是否有另一个模块，或者多处理中的任何一种工作方式，允许类似以下内容（特别是在不强制并行应用函数定义的情况下）存在于类之外

注意：我可以通过将

my_single_function

移动到类外，并将类似

foo.my_args

的内容传递给

map

或

map_async

命令来轻松完成这项任务。但这会将函数的并行执行推到

MyClass

实例之外

对于我的应用程序（并行化大型数据查询，该查询检索、连接和清理每月的数据横截面，然后将它们附加到此类横截面的长时间序列中），在类中具有此功能非常重要，因为我的程序的不同用户将使用不同的时间间隔、不同的时间增量、要收集的不同数据子集等实例化类的不同实例，这些都应该与该实例关联

因此，我希望并行化的工作也由实例来完成，因为它拥有与并行化查询相关的所有数据，而尝试编写一些绑定到某些参数并存在于类之外的hacky wrapper函数将是愚蠢的（特别是因为这样的函数是非通用的。它需要类内部的各种细节。）

Steven Bethard允许对方法进行pickle/unpickle。您可以这样使用它：

import multiprocessing as mp
import copy_reg
import types

def _pickle_method(method):
    # Author: Steven Bethard
    # http://bytes.com/topic/python/answers/552476-why-cant-you-pickle-instancemethods
    func_name = method.im_func.__name__
    obj = method.im_self
    cls = method.im_class
    cls_name = ''
    if func_name.startswith('__') and not func_name.endswith('__'):
        cls_name = cls.__name__.lstrip('_')
    if cls_name:
        func_name = '_' + cls_name + func_name
    return _unpickle_method, (func_name, obj, cls)

def _unpickle_method(func_name, obj, cls):
    # Author: Steven Bethard
    # http://bytes.com/topic/python/answers/552476-why-cant-you-pickle-instancemethods
    for cls in cls.mro():
        try:
            func = cls.__dict__[func_name]
        except KeyError:
            pass
        else:
            break
    return func.__get__(obj, cls)

# This call to copy_reg.pickle allows you to pass methods as the first arg to
# mp.Pool methods. If you comment out this line, `pool.map(self.foo, ...)` results in
# PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup
# __builtin__.instancemethod failed

copy_reg.pickle(types.MethodType, _pickle_method, _unpickle_method)

class MyClass(object):

    def __init__(self):
        self.my_args = [1,2,3,4]
        self.output  = {}

    def my_single_function(self, arg):
        return arg**2

    def my_parallelized_function(self):
        # Use map or map_async to map my_single_function onto the
        # list of self.my_args, and append the return values into
        # self.output, using each arg in my_args as the key.

        # The result should make self.output become
        # {1:1, 2:4, 3:9, 4:16}
        self.output = dict(zip(self.my_args,
                               pool.map(self.my_single_function, self.my_args)))

屈服

print foo.output
# {1: 1, 2: 4, 3: 9, 4: 16}

我认为有一个更好的优雅解决方案。将以下代码添加到对类进行多处理的代码中，您仍然可以通过池传递方法。代码应该位于类的上方

import copy_reg
    import types

    def _reduce_method(meth):
        return (getattr,(meth.__self__,meth.__func__.__name__))
    copy_reg.pickle(types.MethodType,_reduce_method)

有关如何酸洗方法的更多了解，请参见下文

如果您使用一种称为

pathos.multiprocessing

的

多处理分支，您可以在多处理的map
函数中直接使用类和类方法。这是因为使用dill
代替pickle
或cPickle
，并且dill
几乎可以序列化任何东西在python中
pathos.multi-processing
还提供了一个异步映射函数…它可以map
具有多个参数的函数（例如map（math.pow，[1,2,3]，[4,5,6]）
）
见：

以及：

所以我相信你可以做你想做的事
Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> 
>>> class MyClass():
...   def __init__(self):
...     self.my_args = [1,2,3,4]
...     self.output = {}
...   def my_single_function(self, arg):
...     return arg**2
...   def my_parallelized_function(self):
...     res = p.map(self.my_single_function, self.my_args)
...     self.output = dict(zip(self.my_args, res))
... 
>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> p = Pool()
>>> 
>>> foo = MyClass()
>>> foo.my_parallelized_function()
>>> foo.output
{1: 1, 2: 4, 3: 9, 4: 16}
>>>

在此处获取代码：
+1对于方法pickle，我打算建议Cython执行多线程并避免GIL，但对于他尝试执行的操作，这将更容易实现。不过，您可能想提及的是，对于小数据集，这些进程有很大的通信开销（）为了确保我理解，对copy\u reg.pickle
的调用将定义如何通过pickle和unpickle函数作为参数传递来处理MethodType
？因此，在解决了这一问题后，pickle实例方法不再失败，并行化工作正如您所期望的那样？@Pyrce+1用于Cython思想；我已经考虑过了，但是因为它不容易与这个项目的现有工作集成，我认为这对我来说不是最好的路线。但是对于许多科学算法，我希望是这样（或者只是使用mpi4py，这是我更喜欢的）这是一个很好的解决方案。@EMS:Yes，copy\u reg.pickle
教授pickle
如何pickle/unpickle方法。谢谢。这正是我一直在寻找的。这是dill
在封面下为您所做的工作的基础，并且被pathos.multiprocessing
所利用（见我的答案）。太遗憾了，pathos不支持Python 3.0您可以得到您想要的大部分内容（我假设）从与python 3兼容的multi-process
和/或ppft
。我能避免与它们发生冲突吗？@user1700890:是的。pathos
中唯一不是python3的部分与ssh
pathos有关。pp
是ppft
和pathos。多处理是多进程。再次感谢！我正在尝试编译多进程。在Windows上编译Python对普通用户（尤其是3.x）来说是一场灾难。
import copy_reg
    import types

    def _reduce_method(meth):
        return (getattr,(meth.__self__,meth.__func__.__name__))
    copy_reg.pickle(types.MethodType,_reduce_method)

>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> 
>>> p = Pool(4)
>>> 
>>> def add(x,y):
...   return x+y
... 
>>> x = [0,1,2,3]
>>> y = [4,5,6,7]
>>> 
>>> p.map(add, x, y)
[4, 6, 8, 10]
>>> 
>>> class Test(object):
...   def plus(self, x, y): 
...     return x+y
... 
>>> t = Test()
>>> 
>>> p.map(Test.plus, [t]*4, x, y)
[4, 6, 8, 10]
>>> 
>>> p.map(t.plus, x, y)
[4, 6, 8, 10]

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> 
>>> class MyClass():
...   def __init__(self):
...     self.my_args = [1,2,3,4]
...     self.output = {}
...   def my_single_function(self, arg):
...     return arg**2
...   def my_parallelized_function(self):
...     res = p.map(self.my_single_function, self.my_args)
...     self.output = dict(zip(self.my_args, res))
... 
>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> p = Pool()
>>> 
>>> foo = MyClass()
>>> foo.my_parallelized_function()
>>> foo.output
{1: 1, 2: 4, 3: 9, 4: 16}
>>>