Python 在酸洗numpy数组的子类时保留自定义属性
我已经创建了numpy ndarray的一个子类。特别是,我已经修改了提供的代码 我正在使用PythonPython 在酸洗numpy数组的子类时保留自定义属性,python,arrays,numpy,pickle,python-multiprocessing,Python,Arrays,Numpy,Pickle,Python Multiprocessing,我已经创建了numpy ndarray的一个子类。特别是,我已经修改了提供的代码 我正在使用Python多处理在并行循环中操作此类的实例。据我所知,作用域本质上“复制”到多个线程的方式是使用pickle 我现在遇到的问题与numpy数组的酸洗方式有关。我找不到任何关于这方面的全面文档,但有人建议我应该关注\uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu 有人能进一步说明这一点吗?最简单的工作
多处理
在并行循环中操作此类的实例。据我所知,作用域本质上“复制”到多个线程的方式是使用pickle
我现在遇到的问题与numpy数组的酸洗方式有关。我找不到任何关于这方面的全面文档,但有人建议我应该关注\uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
有人能进一步说明这一点吗?最简单的工作示例实际上就是我上面链接的numpy示例代码,为了完整起见复制到这里:
import numpy as np
class RealisticInfoArray(np.ndarray):
def __new__(cls, input_array, info=None):
# Input array is an already formed ndarray instance
# We first cast to be our class type
obj = np.asarray(input_array).view(cls)
# add the new attribute to the created instance
obj.info = info
# Finally, we must return the newly created object:
return obj
def __array_finalize__(self, obj):
# see InfoArray.__array_finalize__ for comments
if obj is None: return
self.info = getattr(obj, 'info', None)
现在问题来了:
import pickle
obj = RealisticInfoArray([1, 2, 3], info='foo')
print obj.info # 'foo'
pickle_str = pickle.dumps(obj)
new_obj = pickle.loads(pickle_str)
print new_obj.info # raises AttributeError
谢谢。np.ndarray
使用\uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu。我们可以看看调用该函数时实际返回的内容,以了解发生的情况:
>>> obj = RealisticInfoArray([1, 2, 3], info='foo')
>>> obj.__reduce__()
(<built-in function _reconstruct>, (<class 'pick.RealisticInfoArray'>, (0,), 'b'), (1, (3,), dtype('int64'), False, '\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'))
用法:
>>> obj = pick.RealisticInfoArray([1, 2, 3], info='foo')
>>> pickle_str = pickle.dumps(obj)
>>> pickle_str
"cnumpy.core.multiarray\n_reconstruct\np0\n(cpick\nRealisticInfoArray\np1\n(I0\ntp2\nS'b'\np3\ntp4\nRp5\n(I1\n(I3\ntp6\ncnumpy\ndtype\np7\n(S'i8'\np8\nI0\nI1\ntp9\nRp10\n(I3\nS'<'\np11\nNNNI-1\nI-1\nI0\ntp12\nbI00\nS'\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x02\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x03\\x00\\x00\\x00\\x00\\x00\\x00\\x00'\np13\nS'foo'\np14\ntp15\nb."
>>> new_obj = pickle.loads(pickle_str)
>>> new_obj.info
'foo'
obj=pick.reality信息数组([1,2,3],info='foo')
>>>pickle_str=pickle.dumps(obj)
>>>泡菜街
“cnumpy.core.multiarray\n\u reconstruct\np0\n作者。dill
在numpy
自己能做它之前,正在腌制一个numpy.array
。@dano的解释非常准确。就我个人而言,我会用dill
让它为你做这项工作。有了dill
,你不需要\uu reduce\ucode>,因为dill
有几种方法可以抓住子类属性…其中之一是存储任何类对象的\uuuuu dict\uuuuu
。pickle
不这样做,b/c它通常通过名称引用与类一起工作,而不存储类对象本身…因此您必须使用\uuuuuu reduce\uuuuuuuu
使pickle
为您工作。在大多数情况下,不需要thdill
>>> import numpy as np
>>>
>>> class RealisticInfoArray(np.ndarray):
... def __new__(cls, input_array, info=None):
... # Input array is an already formed ndarray instance
... # We first cast to be our class type
... obj = np.asarray(input_array).view(cls)
... # add the new attribute to the created instance
... obj.info = info
... # Finally, we must return the newly created object:
... return obj
... def __array_finalize__(self, obj):
... # see InfoArray.__array_finalize__ for comments
... if obj is None: return
... self.info = getattr(obj, 'info', None)
...
>>> import dill as pickle
>>> obj = RealisticInfoArray([1, 2, 3], info='foo')
>>> print obj.info # 'foo'
foo
>>>
>>> pickle_str = pickle.dumps(obj)
>>> new_obj = pickle.loads(pickle_str)
>>> print new_obj.info
foo
dill
可以扩展到pickle
(基本上是通过copy\u reg
它所知道的一切),这样您就可以在任何使用pickle
的东西中使用所有dill
类型。现在,如果您要使用多处理
,您就有点麻烦了,因为它使用cPickle
。然而,多处理
的pathos
分支(称为pathos.multiprocessing
),基本上唯一的变化是它使用了dill
而不是cPickle
…因此可以在池中序列化更多的映射
。我认为(目前)如果你想在多处理
(或pathos.multi-processing
)中使用numpy.array
的子类,您可能必须执行@dano建议的操作,但不确定,因为我没有想到一个好的案例来测试您的子类
如果您感兴趣,请在此处获取pathos
:这里是对@dano的答案和@Gabriel的评论的一点改进。利用\uuu dict\uuu
属性进行序列化对我来说甚至对子类也是有效的
def __reduce__(self):
# Get the parent's __reduce__ tuple
pickled_state = super(RealisticInfoArray, self).__reduce__()
# Create our own tuple to pass to __setstate__, but append the __dict__ rather than individual members.
new_state = pickled_state[2] + (self.__dict__,)
# Return a tuple that replaces the parent's __setstate__ tuple with our own
return (pickled_state[0], pickled_state[1], new_state)
def __setstate__(self, state):
self.__dict__.update(state[-1]) # Update the internal dict from state
# Call the parent's __setstate__ with the other tuple elements.
super(RealisticInfoArray, self).__setstate__(state[0:-1])
这是一个完整的例子:聪明的回答,非常感谢。我之所以不接受这一点,是因为“达诺的建议不需要套餐。”我认为迪尔总是比泡菜更好,而且会强烈考虑未来的悲怆。加布里埃尔,在这种情况下,我也会吃“丹诺”的。r超过了我的,但我认为信息越多越好。:)太好了,这已经解决了它。还感谢您提供了非常清晰的代码示例。我实际上正在跨\uuuu dict\uuuu
对象进行传输,以使其更通用。幸运的是,np.ndarray似乎没有使用它,所以我可以出于自己的目的自由使用它。
def __reduce__(self):
# Get the parent's __reduce__ tuple
pickled_state = super(RealisticInfoArray, self).__reduce__()
# Create our own tuple to pass to __setstate__, but append the __dict__ rather than individual members.
new_state = pickled_state[2] + (self.__dict__,)
# Return a tuple that replaces the parent's __setstate__ tuple with our own
return (pickled_state[0], pickled_state[1], new_state)
def __setstate__(self, state):
self.__dict__.update(state[-1]) # Update the internal dict from state
# Call the parent's __setstate__ with the other tuple elements.
super(RealisticInfoArray, self).__setstate__(state[0:-1])