Python 无扫描中的内存错误_Python_Theano

Python 无扫描中的内存错误

python

Python 无扫描中的内存错误,python,theano,Python,Theano,我有以下功能（作为自定义Keras层的一部分，但这并不重要）： self.encoder\u步骤执行一些递归计算如果我使用中等大小的参数（bucket\u size=128、self.batch\u size=64、self.max\u len=128、self.hidden\u dim=256）运行该函数，那么我将从内存中获取CNMEM\u状态。错误日志（exception_verbosity=high）显示，theano为所有形状（X，64，128，512）的_in place，gpu，s

我有以下功能（作为自定义Keras层的一部分，但这并不重要）：

self.encoder\u步骤执行一些递归计算

如果我使用中等大小的参数（bucket\u size=128、self.batch\u size=64、self.max\u len=128、self.hidden\u dim=256）运行该函数，那么我将从内存中获取CNMEM\u状态。错误日志（exception_verbosity=high）显示，theano为所有形状（X，64，128，512）的_in place，gpu，scan_fn}.0'分配了整数张量'。似乎theano仍然存储每个步骤的扫描输出值，即使我不使用它们

日志示例：

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/NLP_RL/train_char.py", line 229, in run_training_RL
    loss1 = encoder.train_on_batch(batch[0], batch[1])
  File "/usr/local/lib/python3.4/dist-packages/keras/engine/training.py", line 1239, in train_on_batch
    outputs = self.train_function(ins)
  File "/usr/local/lib/python3.4/dist-packages/keras/backend/theano_backend.py", line 792, in __call__
    return self.function(*inputs)
  File "/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py", line 871, in __call__
    storage_map=getattr(self.fn, 'storage_map', None))
  File "/usr/local/lib/python3.4/dist-packages/theano/gof/link.py", line 314, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "/usr/local/lib/python3.4/dist-packages/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py", line 859, in __call__
    outputs = self.fn()
MemoryError: Error allocating 3758096384 bytes of device memory (CNMEM_STATUS_OUT_OF_MEMORY).
Apply node that caused the error: GpuAlloc{memset_0=True}(CudaNdarrayConstant{[[[[ 0.]]]]}, Elemwise{sub,no_inplace}.0, Shape_i{0}.0, Shape_i{1}.0, Shape_i{2}.0)
Toposort index: 121
Inputs types: [CudaNdarrayType(float32, (True, True, True, True)), TensorType(int64, scalar), TensorType(int64, scalar), TensorType(int64, scalar), TensorType(int64, scalar)]
Inputs shapes: [(1, 1, 1, 1), (), (), (), ()]
Inputs strides: [(0, 0, 0, 0), (), (), (), ()]
Inputs values: [b'CudaNdarray([[[[ 0.]]]])', array(224), array(64), array(128), array(512)]
Outputs clients: [[GpuIncSubtensor{InplaceInc;int64}(GpuAlloc{memset_0=True}.0, GpuIncSubtensor{InplaceInc;::, int64, int64::}.0, Constant{-1})]]
...
Storage map footprint:
 - forall_inplace,gpu,scan_fn}.0, Shape: (225, 64, 128, 512), ElemSize: 4 Byte(s), TotalSize: 3774873600 Byte(s)
 - GpuAlloc{memset_0=True}.0, Shape: (225, 64, 128, 512), ElemSize: 4 Byte(s), TotalSize: 3774873600 Byte(s)
 - GpuElemwise{Add}[(0, 0)].0, Shape: (64, 128, 512), ElemSize: 4 Byte(s), TotalSize: 16777216 Byte(s)
 - <CudaNdarrayType(float32, 3D)>, Shared Input, Shape: (64, 128, 512), ElemSize: 4 Byte(s), TotalSize: 16777216 Byte(s)
 - forall_inplace,gpu,scan_fn}.1, Shape: (225, 64, 128), ElemSize: 4 Byte(s), TotalSize: 7372800 Byte(s)
 - forall_inplace,gpu,scan_fn}.2, Shape: (225, 64, 128), ElemSize: 4 Byte(s), TotalSize: 7372800 Byte(s)
 - input_8, Input, Shape: (64, 128, 83), ElemSize: 4 Byte(s), TotalSize: 2719744 Byte(s)
 - GpuReshape{2}.0, Shape: (8192, 83), ElemSize: 4 Byte(s), TotalSize: 2719744 Byte(s)
...

回溯（最近一次呼叫最后一次）：
文件“”，第1行，在
文件“/home/ubuntu/NLP_RL/train_char.py”，第229行，在run_training_RL中
loss1=编码器。在批上对批进行训练（批[0]，批[1]）
文件“/usr/local/lib/python3.4/dist-packages/keras/engine/training.py”，第1239行，在批量生产中
输出=自列车功能（ins）
文件“/usr/local/lib/python3.4/dist packages/keras/backend/theano_backend.py”，第792行，在调用中__
返回自我功能（*输入）
文件“/usr/local/lib/python3.4/dist packages/theano/compile/function_module.py”，第871行，在调用__
存储映射=getattr（self.fn，“存储映射”，无））
文件“/usr/local/lib/python3.4/dist-packages/theano/gof/link.py”，第314行，用
重新释放（exc_类型、exc_值、exc_跟踪）
文件“/usr/local/lib/python3.4/dist-packages/six.py”，第685行，在reraise中
通过_回溯（tb）提升值
文件“/usr/local/lib/python3.4/dist packages/theano/compile/function\u module.py”，第859行，在调用中__
输出=self.fn（）
MemoryError:分配3758096384字节的设备内存（CNMEM_状态_OUT_内存）时出错。
导致错误的应用节点：GpuAlloc{memset_0=True}（cudandarayconstant{[[[0.]]]]]}，Elemwise{sub，no_inplace}.0，Shape_i{0}.0，Shape_i{1}.0，Shape_i{2}.0）
拓扑排序索引：121
输入类型：[cudandaraytype（float32，（True，True，True，True）），TensorType（int64，标量），TensorType（int64，标量），TensorType（int64，标量），TensorType（int64，标量）]
输入形状：[（1,1,1,1），（），（），（），（），（），（）]
输入跨步：[（0,0,0,0），（），（），（），（），（），（）]
输入值：[b'CudaNdarray（[[[0.]]]]）、数组（224）、数组（64）、数组（128）、数组（512）]
输出客户端：[[GpuIncSubtensor{InplaceInc；int64}（GpuAlloc{memset_0=True}.0，GpuIncSubtensor{InplaceInc；：：：，int64，int64:：}.0，常数{-1}）]]
...
存储映射示意图：
-forall_inplace，gpu，scan_fn}.0，形状：（225，64，128，512），ElemSize:4字节，TotalSize:3774873600字节
-GpuAlloc{memset_0=True}.0，形状：（225,64,128,512），元素化：4字节，总大小：3774873600字节
-GpuElemwise{Add}[（0，0）].0，形状：（64128512），元素化：4字节，总大小：16777216字节
-，共享输入，形状：（64128512），元素化：4字节，总大小：16777216字节
-forall_in place，gpu，扫描_fn}.1，形状：（225,64,128），电子化：4字节，总大小：7372800字节
-forall_in place，gpu，扫描_fn}.2，形状：（225，64，128），电子化：4字节，总大小：7372800字节
-输入8，输入，形状：（64，128，83），ElemSize:4字节，TotalSize:2719744字节
-GpuReshape{2}.0，Shape:（8192,83），ElemSize:4字节，TotalSize:2719744字节
...

如何正确使用扫描来排除中间结果的存储

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/NLP_RL/train_char.py", line 229, in run_training_RL
    loss1 = encoder.train_on_batch(batch[0], batch[1])
  File "/usr/local/lib/python3.4/dist-packages/keras/engine/training.py", line 1239, in train_on_batch
    outputs = self.train_function(ins)
  File "/usr/local/lib/python3.4/dist-packages/keras/backend/theano_backend.py", line 792, in __call__
    return self.function(*inputs)
  File "/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py", line 871, in __call__
    storage_map=getattr(self.fn, 'storage_map', None))
  File "/usr/local/lib/python3.4/dist-packages/theano/gof/link.py", line 314, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "/usr/local/lib/python3.4/dist-packages/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py", line 859, in __call__
    outputs = self.fn()
MemoryError: Error allocating 3758096384 bytes of device memory (CNMEM_STATUS_OUT_OF_MEMORY).
Apply node that caused the error: GpuAlloc{memset_0=True}(CudaNdarrayConstant{[[[[ 0.]]]]}, Elemwise{sub,no_inplace}.0, Shape_i{0}.0, Shape_i{1}.0, Shape_i{2}.0)
Toposort index: 121
Inputs types: [CudaNdarrayType(float32, (True, True, True, True)), TensorType(int64, scalar), TensorType(int64, scalar), TensorType(int64, scalar), TensorType(int64, scalar)]
Inputs shapes: [(1, 1, 1, 1), (), (), (), ()]
Inputs strides: [(0, 0, 0, 0), (), (), (), ()]
Inputs values: [b'CudaNdarray([[[[ 0.]]]])', array(224), array(64), array(128), array(512)]
Outputs clients: [[GpuIncSubtensor{InplaceInc;int64}(GpuAlloc{memset_0=True}.0, GpuIncSubtensor{InplaceInc;::, int64, int64::}.0, Constant{-1})]]
...
Storage map footprint:
 - forall_inplace,gpu,scan_fn}.0, Shape: (225, 64, 128, 512), ElemSize: 4 Byte(s), TotalSize: 3774873600 Byte(s)
 - GpuAlloc{memset_0=True}.0, Shape: (225, 64, 128, 512), ElemSize: 4 Byte(s), TotalSize: 3774873600 Byte(s)
 - GpuElemwise{Add}[(0, 0)].0, Shape: (64, 128, 512), ElemSize: 4 Byte(s), TotalSize: 16777216 Byte(s)
 - <CudaNdarrayType(float32, 3D)>, Shared Input, Shape: (64, 128, 512), ElemSize: 4 Byte(s), TotalSize: 16777216 Byte(s)
 - forall_inplace,gpu,scan_fn}.1, Shape: (225, 64, 128), ElemSize: 4 Byte(s), TotalSize: 7372800 Byte(s)
 - forall_inplace,gpu,scan_fn}.2, Shape: (225, 64, 128), ElemSize: 4 Byte(s), TotalSize: 7372800 Byte(s)
 - input_8, Input, Shape: (64, 128, 83), ElemSize: 4 Byte(s), TotalSize: 2719744 Byte(s)
 - GpuReshape{2}.0, Shape: (8192, 83), ElemSize: 4 Byte(s), TotalSize: 2719744 Byte(s)
...