Python 来自Theano的扫描函数复制非_序列共享变量_Python_Theano_Deep Learning

Python 来自Theano的扫描函数复制非_序列共享变量

python deep-learning

Python 来自Theano的扫描函数复制非_序列共享变量,python,theano,deep-learning,Python,Theano,Deep Learning,我试图在Theano中为CNN网络实现一个自定义卷积层，为了实现这一点，我使用了扫描功能。其思想是将新的卷积掩模应用于每个像素 scan函数编译正确，但由于某种原因，我出现内存不足错误。调试（见下文）表明，为循环的每个实例（每个像素）复制非\u序列变量，这当然会杀死我的GPU内存： def convolve_location(index, input, bias): hsize = self.W.shape / 2 t = T.switch(index[0]-hsize[0] &

我试图在Theano中为CNN网络实现一个自定义卷积层，为了实现这一点，我使用了扫描功能。其思想是将新的卷积掩模应用于每个像素

scan

函数编译正确，但由于某种原因，我出现内存不足错误。调试（见下文）表明，为循环的每个实例（每个像素）复制

非\u序列

变量，这当然会杀死我的GPU内存：

def convolve_location(index, input, bias):
    hsize = self.W.shape / 2
    t = T.switch(index[0]-hsize[0] < 0, 0, index[0]-hsize[0])
    l = T.switch(index[1]-hsize[1] < 0, 0, index[1]-hsize[1])
    b = T.switch(index[0]+hsize[0] >= input.shape[2], input.shape[2]-1, index[0]+hsize[0])
    r = T.switch(index[1]+hsize[1] >= input.shape[3], input.shape[3]-1, index[1]+hsize[1])

    r_image = (input[:, :, t:b, l:r] - input[:, :, index[0], index[1]][:, :, None, None]) ** 2
    r_delta = (bias[:, :, t:b, l:r] - bias[:, :, index[0], index[1]][:, :, None, None]) ** 2
    return T.sum(r_image*r_delta)

# # Define cost function over all pixels
self.inds = theano.shared(np.array([(i, j) for i in range(self.image_shape[2]) for j in range(self.image_shape[3])], dtype='int32'), borrow=True)
self.cost = T.sum(theano.scan(
    fn=convolve_location,
    outputs_info=None,
    sequences=[self.inds],
    non_sequences=[self.input, self.b],
    n_steps=np.prod(self.image_shape[-2:])
)[0])

当首次创建扫描时或在优化过程中的某个点，可能会创建具有该形状的符号

Alloc

。但是，应在优化过程的后期对其进行优化

我们知道最近有一个与之相关的问题，现在应该在Theano的开发（“前沿”）版本中解决。事实上，我刚刚用最新的开发版本尝试了你的代码片段（稍微编辑），没有内存错误。此外，在计算图中任何地方都没有5D张量，这表明该缺陷确实已经修复

最后，请注意，像卷积这样的操作，如果用

scan

表示，而不是用现有的卷积操作之一表示，那么它们可能会慢得多。特别是，当循环的迭代互不依赖时，

scan

将无法有效地并行化。

您没有说明如何定义

self.input

和

self.b

。它们是共享变量吗？另外，给你的Theano变量命名也可能有助于调试。谢谢cfh，我已经编辑了这篇文章。这两个变量确实是共享的。但是命名它们会有点混乱，因为网络中的每一层都会生成这些变量的各自版本。

MemoryError: alloc failed Apply node that caused the error: Alloc(TensorConstant{0.0}, TensorConstant{1025}, TensorConstant{2000}, TensorConstant{3}, TensorConstant{32}, TensorConstant{32}) Inputs types: [TensorType(float32, scalar), TensorType(int64, scalar), TensorType(int64, scalar), TensorType(int64, scalar), TensorType(int64, scalar), TensorType(int64, scalar)] Inputs shapes: [(), (), (), (), (), ()] Inputs strides: [(), (), (), (), (), ()] Inputs values: [array(0.0, dtype=float32), array(1025), array(2000), array(3), array(32), array(32)]

Debugprint of the apply node:  Alloc [@A] <TensorType(float32, 5D)> '' |TensorConstant{0.0} [@B] <TensorType(float32, scalar)>  |TensorConstant{1025} [@C] <TensorType(int64, scalar)>  |TensorConstant{2000} [@D] <TensorType(int64, scalar)>  |TensorConstant{3} [@E] <TensorType(int64, scalar)>  |TensorConstant{32} [@F] <TensorType(int64, scalar)>  |TensorConstant{32} [@F] <TensorType(int64, scalar)> Storage map footprint:
 - CudaNdarrayConstant{[[[[ 0.]]]]}, Shape: (1, 1, 1, 1), ElemSize: 4 Byte(s), TotalSize: 4 Byte(s)
 - Constant{18}, Shape: (1,), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)

 - TensorConstant{(1, 1) of 0}, Shape: (1, 1), ElemSize: 1 Byte(s), TotalSize: 1 Byte(s)
 - Constant{1024}, Shape: (1,), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Constant{-1}, Shape: (1,), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - TensorConstant{32}, Shape: (1,), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Subtensor{:int64:}.0, Shape: (1024,), ElemSize: 4 Byte(s), TotalSize: 4096 Byte(s)
 - Constant{34}, Shape: (1,), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Constant{2}, Shape: (1,), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - TensorConstant{[2000    3..  32   32]}, Shape: (4,), ElemSize: 8 Byte(s), TotalSize: 32 Byte(s)
 - Reshape{4}.0, Shape: (2000, 3, 32, 32), ElemSize: 4 Byte(s), TotalSize: 24576000 Byte(s)
 - TensorConstant{(1, 1, 1, 1) of 0}, Shape: (1, 1, 1, 1), ElemSize: 1 Byte(s), TotalSize: 1 Byte(s)
 - CudaNdarrayConstant{[[[[ 0.1]]]]}, Shape: (1, 1, 1, 1), ElemSize: 4 Byte(s), TotalSize: 4 Byte(s)
 - <TensorType(float32, matrix)>, Shape: (50000, 3072), ElemSize: 4 Byte(s), TotalSize: 614400000 Byte(s)

self.b = theano.shared(np.zeros(image_shape, dtype=theano.config.floatX), borrow=True)