Python 在酸洗numpy数组的子类时保留自定义属性_Python_Arrays_Numpy_Pickle_Python Multiprocessing

Python 在酸洗numpy数组的子类时保留自定义属性

python arrays numpy

Python 在酸洗numpy数组的子类时保留自定义属性,python,arrays,numpy,pickle,python-multiprocessing,Python,Arrays,Numpy,Pickle,Python Multiprocessing,我已经创建了numpy ndarray的一个子类。特别是，我已经修改了提供的代码我正在使用Python多处理在并行循环中操作此类的实例。据我所知，作用域本质上“复制”到多个线程的方式是使用pickle 我现在遇到的问题与numpy数组的酸洗方式有关。我找不到任何关于这方面的全面文档，但有人建议我应该关注\uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu 有人能进一步说明这一点吗？最简单的工作

我已经创建了numpy ndarray的一个子类。特别是，我已经修改了提供的代码

我正在使用Python

多处理

在并行循环中操作此类的实例。据我所知，作用域本质上“复制”到多个线程的方式是使用

pickle

我现在遇到的问题与numpy数组的酸洗方式有关。我找不到任何关于这方面的全面文档，但有人建议我应该关注

\uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
有人能进一步说明这一点吗？最简单的工作示例实际上就是我上面链接的numpy示例代码，为了完整起见复制到这里：
import numpy as np

class RealisticInfoArray(np.ndarray):

    def __new__(cls, input_array, info=None):
        # Input array is an already formed ndarray instance
        # We first cast to be our class type
        obj = np.asarray(input_array).view(cls)
        # add the new attribute to the created instance
        obj.info = info
        # Finally, we must return the newly created object:
        return obj

    def __array_finalize__(self, obj):
        # see InfoArray.__array_finalize__ for comments
        if obj is None: return
        self.info = getattr(obj, 'info', None)

现在问题来了：
import pickle

obj = RealisticInfoArray([1, 2, 3], info='foo')
print obj.info  # 'foo'

pickle_str = pickle.dumps(obj)
new_obj = pickle.loads(pickle_str)
print new_obj.info  #  raises AttributeError

谢谢。
np.ndarray
使用\uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu。我们可以看看调用该函数时实际返回的内容，以了解发生的情况：
>>> obj = RealisticInfoArray([1, 2, 3], info='foo')
>>> obj.__reduce__()
(<built-in function _reconstruct>, (<class 'pick.RealisticInfoArray'>, (0,), 'b'), (1, (3,), dtype('int64'), False, '\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'))

用法：
>>> obj = pick.RealisticInfoArray([1, 2, 3], info='foo')
>>> pickle_str = pickle.dumps(obj)
>>> pickle_str
"cnumpy.core.multiarray\n_reconstruct\np0\n(cpick\nRealisticInfoArray\np1\n(I0\ntp2\nS'b'\np3\ntp4\nRp5\n(I1\n(I3\ntp6\ncnumpy\ndtype\np7\n(S'i8'\np8\nI0\nI1\ntp9\nRp10\n(I3\nS'<'\np11\nNNNI-1\nI-1\nI0\ntp12\nbI00\nS'\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x02\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x03\\x00\\x00\\x00\\x00\\x00\\x00\\x00'\np13\nS'foo'\np14\ntp15\nb."
>>> new_obj = pickle.loads(pickle_str)
>>> new_obj.info
'foo'

obj=pick.reality信息数组（[1,2,3]，info='foo'）
>>>pickle_str=pickle.dumps（obj）
>>>泡菜街
“cnumpy.core.multiarray\n\u reconstruct\np0\n作者。dill
在numpy
自己能做它之前，正在腌制一个numpy.array
。@dano的解释非常准确。就我个人而言，我会用dill
让它为你做这项工作。有了dill
，你不需要\uu reduce\ucode>，因为dill
有几种方法可以抓住子类属性…其中之一是存储任何类对象的\uuuuu dict\uuuuu
。pickle
不这样做，b/c它通常通过名称引用与类一起工作，而不存储类对象本身…因此您必须使用\uuuuuu reduce\uuuuuuuu
使pickle
为您工作。在大多数情况下，不需要thdill

>>> import numpy as np
>>> 
>>> class RealisticInfoArray(np.ndarray):
...     def __new__(cls, input_array, info=None):
...         # Input array is an already formed ndarray instance
...         # We first cast to be our class type
...         obj = np.asarray(input_array).view(cls)
...         # add the new attribute to the created instance
...         obj.info = info
...         # Finally, we must return the newly created object:
...         return obj
...     def __array_finalize__(self, obj):
...         # see InfoArray.__array_finalize__ for comments
...         if obj is None: return
...         self.info = getattr(obj, 'info', None)
... 
>>> import dill as pickle
>>> obj = RealisticInfoArray([1, 2, 3], info='foo')
>>> print obj.info  # 'foo'
foo
>>> 
>>> pickle_str = pickle.dumps(obj)
>>> new_obj = pickle.loads(pickle_str)
>>> print new_obj.info
foo

dill
可以扩展到pickle
（基本上是通过copy\u reg
它所知道的一切），这样您就可以在任何使用pickle
的东西中使用所有dill
类型。现在，如果您要使用多处理
，您就有点麻烦了，因为它使用cPickle
。然而，多处理
的pathos
分支（称为pathos.multiprocessing
），基本上唯一的变化是它使用了dill
而不是cPickle
…因此可以在池中序列化更多的映射
。我认为（目前）如果你想在多处理
（或pathos.multi-processing
）中使用numpy.array
的子类，您可能必须执行@dano建议的操作，但不确定，因为我没有想到一个好的案例来测试您的子类
如果您感兴趣，请在此处获取pathos
：
这里是对@dano的答案和@Gabriel的评论的一点改进。利用\uuu dict\uuu
属性进行序列化对我来说甚至对子类也是有效的
def __reduce__(self):
    # Get the parent's __reduce__ tuple
    pickled_state = super(RealisticInfoArray, self).__reduce__()
    # Create our own tuple to pass to __setstate__, but append the __dict__ rather than individual members.
    new_state = pickled_state[2] + (self.__dict__,)
    # Return a tuple that replaces the parent's __setstate__ tuple with our own
    return (pickled_state[0], pickled_state[1], new_state)

def __setstate__(self, state):
    self.__dict__.update(state[-1])  # Update the internal dict from state
    # Call the parent's __setstate__ with the other tuple elements.
    super(RealisticInfoArray, self).__setstate__(state[0:-1])

这是一个完整的例子：
聪明的回答，非常感谢。我之所以不接受这一点，是因为“达诺的建议不需要套餐。”我认为迪尔总是比泡菜更好，而且会强烈考虑未来的悲怆。加布里埃尔，在这种情况下，我也会吃“丹诺”的。r超过了我的，但我认为信息越多越好。：）太好了，这已经解决了它。还感谢您提供了非常清晰的代码示例。我实际上正在跨\uuuu dict\uuuu对象进行传输，以使其更通用。幸运的是，np.ndarray似乎没有使用它，所以我可以出于自己的目的自由使用它。
def __reduce__(self):
    # Get the parent's __reduce__ tuple
    pickled_state = super(RealisticInfoArray, self).__reduce__()
    # Create our own tuple to pass to __setstate__, but append the __dict__ rather than individual members.
    new_state = pickled_state[2] + (self.__dict__,)
    # Return a tuple that replaces the parent's __setstate__ tuple with our own
    return (pickled_state[0], pickled_state[1], new_state)

def __setstate__(self, state):
    self.__dict__.update(state[-1])  # Update the internal dict from state
    # Call the parent's __setstate__ with the other tuple elements.
    super(RealisticInfoArray, self).__setstate__(state[0:-1])