Python 无扫描中的内存错误

Python 无扫描中的内存错误,python,theano,Python,Theano,我有以下功能(作为自定义Keras层的一部分,但这并不重要): self.encoder\u步骤执行一些递归计算 如果我使用中等大小的参数(bucket\u size=128、self.batch\u size=64、self.max\u len=128、self.hidden\u dim=256)运行该函数,那么我将从内存中获取CNMEM\u状态。错误日志(exception_verbosity=high)显示,theano为所有形状(X,64,128,512)的_in place,gpu,s

我有以下功能(作为自定义Keras层的一部分,但这并不重要):

self.encoder\u步骤执行一些递归计算

如果我使用中等大小的参数(bucket\u size=128、self.batch\u size=64、self.max\u len=128、self.hidden\u dim=256)运行该函数,那么我将从内存中获取CNMEM\u状态。错误日志(exception_verbosity=high)显示,theano为所有形状(X,64,128,512)的_in place,gpu,scan_fn}.0'分配了整数张量'。 似乎theano仍然存储每个步骤的扫描输出值,即使我不使用它们

日志示例:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/NLP_RL/train_char.py", line 229, in run_training_RL
    loss1 = encoder.train_on_batch(batch[0], batch[1])
  File "/usr/local/lib/python3.4/dist-packages/keras/engine/training.py", line 1239, in train_on_batch
    outputs = self.train_function(ins)
  File "/usr/local/lib/python3.4/dist-packages/keras/backend/theano_backend.py", line 792, in __call__
    return self.function(*inputs)
  File "/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py", line 871, in __call__
    storage_map=getattr(self.fn, 'storage_map', None))
  File "/usr/local/lib/python3.4/dist-packages/theano/gof/link.py", line 314, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "/usr/local/lib/python3.4/dist-packages/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py", line 859, in __call__
    outputs = self.fn()
MemoryError: Error allocating 3758096384 bytes of device memory (CNMEM_STATUS_OUT_OF_MEMORY).
Apply node that caused the error: GpuAlloc{memset_0=True}(CudaNdarrayConstant{[[[[ 0.]]]]}, Elemwise{sub,no_inplace}.0, Shape_i{0}.0, Shape_i{1}.0, Shape_i{2}.0)
Toposort index: 121
Inputs types: [CudaNdarrayType(float32, (True, True, True, True)), TensorType(int64, scalar), TensorType(int64, scalar), TensorType(int64, scalar), TensorType(int64, scalar)]
Inputs shapes: [(1, 1, 1, 1), (), (), (), ()]
Inputs strides: [(0, 0, 0, 0), (), (), (), ()]
Inputs values: [b'CudaNdarray([[[[ 0.]]]])', array(224), array(64), array(128), array(512)]
Outputs clients: [[GpuIncSubtensor{InplaceInc;int64}(GpuAlloc{memset_0=True}.0, GpuIncSubtensor{InplaceInc;::, int64, int64::}.0, Constant{-1})]]
...
Storage map footprint:
 - forall_inplace,gpu,scan_fn}.0, Shape: (225, 64, 128, 512), ElemSize: 4 Byte(s), TotalSize: 3774873600 Byte(s)
 - GpuAlloc{memset_0=True}.0, Shape: (225, 64, 128, 512), ElemSize: 4 Byte(s), TotalSize: 3774873600 Byte(s)
 - GpuElemwise{Add}[(0, 0)].0, Shape: (64, 128, 512), ElemSize: 4 Byte(s), TotalSize: 16777216 Byte(s)
 - <CudaNdarrayType(float32, 3D)>, Shared Input, Shape: (64, 128, 512), ElemSize: 4 Byte(s), TotalSize: 16777216 Byte(s)
 - forall_inplace,gpu,scan_fn}.1, Shape: (225, 64, 128), ElemSize: 4 Byte(s), TotalSize: 7372800 Byte(s)
 - forall_inplace,gpu,scan_fn}.2, Shape: (225, 64, 128), ElemSize: 4 Byte(s), TotalSize: 7372800 Byte(s)
 - input_8, Input, Shape: (64, 128, 83), ElemSize: 4 Byte(s), TotalSize: 2719744 Byte(s)
 - GpuReshape{2}.0, Shape: (8192, 83), ElemSize: 4 Byte(s), TotalSize: 2719744 Byte(s)
...
回溯(最近一次呼叫最后一次):
文件“”,第1行,在
文件“/home/ubuntu/NLP_RL/train_char.py”,第229行,在run_training_RL中
loss1=编码器。在批上对批进行训练(批[0],批[1])
文件“/usr/local/lib/python3.4/dist-packages/keras/engine/training.py”,第1239行,在批量生产中
输出=自列车功能(ins)
文件“/usr/local/lib/python3.4/dist packages/keras/backend/theano_backend.py”,第792行,在调用中__
返回自我功能(*输入)
文件“/usr/local/lib/python3.4/dist packages/theano/compile/function_module.py”,第871行,在调用__
存储映射=getattr(self.fn,“存储映射”,无))
文件“/usr/local/lib/python3.4/dist-packages/theano/gof/link.py”,第314行,用
重新释放(exc_类型、exc_值、exc_跟踪)
文件“/usr/local/lib/python3.4/dist-packages/six.py”,第685行,在reraise中
通过_回溯(tb)提升值
文件“/usr/local/lib/python3.4/dist packages/theano/compile/function\u module.py”,第859行,在调用中__
输出=self.fn()
MemoryError:分配3758096384字节的设备内存(CNMEM_状态_OUT_内存)时出错。
导致错误的应用节点:GpuAlloc{memset_0=True}(cudandarayconstant{[[[0.]]]]]},Elemwise{sub,no_inplace}.0,Shape_i{0}.0,Shape_i{1}.0,Shape_i{2}.0)
拓扑排序索引:121
输入类型:[cudandaraytype(float32,(True,True,True,True)),TensorType(int64,标量),TensorType(int64,标量),TensorType(int64,标量),TensorType(int64,标量)]
输入形状:[(1,1,1,1),(),(),(),(),(),()]
输入跨步:[(0,0,0,0),(),(),(),(),(),()]
输入值:[b'CudaNdarray([[[0.]]]])、数组(224)、数组(64)、数组(128)、数组(512)]
输出客户端:[[GpuIncSubtensor{InplaceInc;int64}(GpuAlloc{memset_0=True}.0,GpuIncSubtensor{InplaceInc;:::,int64,int64::}.0,常数{-1})]]
...
存储映射示意图:
-forall_inplace,gpu,scan_fn}.0,形状:(225,64,128,512),ElemSize:4字节,TotalSize:3774873600字节
-GpuAlloc{memset_0=True}.0,形状:(225,64,128,512),元素化:4字节,总大小:3774873600字节
-GpuElemwise{Add}[(0,0)].0,形状:(64128512),元素化:4字节,总大小:16777216字节
-,共享输入,形状:(64128512),元素化:4字节,总大小:16777216字节
-forall_in place,gpu,扫描_fn}.1,形状:(225,64,128),电子化:4字节,总大小:7372800字节
-forall_in place,gpu,扫描_fn}.2,形状:(225,64,128),电子化:4字节,总大小:7372800字节
-输入8,输入,形状:(64,128,83),ElemSize:4字节,TotalSize:2719744字节
-GpuReshape{2}.0,Shape:(8192,83),ElemSize:4字节,TotalSize:2719744字节
...
如何正确使用扫描来排除中间结果的存储

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/NLP_RL/train_char.py", line 229, in run_training_RL
    loss1 = encoder.train_on_batch(batch[0], batch[1])
  File "/usr/local/lib/python3.4/dist-packages/keras/engine/training.py", line 1239, in train_on_batch
    outputs = self.train_function(ins)
  File "/usr/local/lib/python3.4/dist-packages/keras/backend/theano_backend.py", line 792, in __call__
    return self.function(*inputs)
  File "/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py", line 871, in __call__
    storage_map=getattr(self.fn, 'storage_map', None))
  File "/usr/local/lib/python3.4/dist-packages/theano/gof/link.py", line 314, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "/usr/local/lib/python3.4/dist-packages/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py", line 859, in __call__
    outputs = self.fn()
MemoryError: Error allocating 3758096384 bytes of device memory (CNMEM_STATUS_OUT_OF_MEMORY).
Apply node that caused the error: GpuAlloc{memset_0=True}(CudaNdarrayConstant{[[[[ 0.]]]]}, Elemwise{sub,no_inplace}.0, Shape_i{0}.0, Shape_i{1}.0, Shape_i{2}.0)
Toposort index: 121
Inputs types: [CudaNdarrayType(float32, (True, True, True, True)), TensorType(int64, scalar), TensorType(int64, scalar), TensorType(int64, scalar), TensorType(int64, scalar)]
Inputs shapes: [(1, 1, 1, 1), (), (), (), ()]
Inputs strides: [(0, 0, 0, 0), (), (), (), ()]
Inputs values: [b'CudaNdarray([[[[ 0.]]]])', array(224), array(64), array(128), array(512)]
Outputs clients: [[GpuIncSubtensor{InplaceInc;int64}(GpuAlloc{memset_0=True}.0, GpuIncSubtensor{InplaceInc;::, int64, int64::}.0, Constant{-1})]]
...
Storage map footprint:
 - forall_inplace,gpu,scan_fn}.0, Shape: (225, 64, 128, 512), ElemSize: 4 Byte(s), TotalSize: 3774873600 Byte(s)
 - GpuAlloc{memset_0=True}.0, Shape: (225, 64, 128, 512), ElemSize: 4 Byte(s), TotalSize: 3774873600 Byte(s)
 - GpuElemwise{Add}[(0, 0)].0, Shape: (64, 128, 512), ElemSize: 4 Byte(s), TotalSize: 16777216 Byte(s)
 - <CudaNdarrayType(float32, 3D)>, Shared Input, Shape: (64, 128, 512), ElemSize: 4 Byte(s), TotalSize: 16777216 Byte(s)
 - forall_inplace,gpu,scan_fn}.1, Shape: (225, 64, 128), ElemSize: 4 Byte(s), TotalSize: 7372800 Byte(s)
 - forall_inplace,gpu,scan_fn}.2, Shape: (225, 64, 128), ElemSize: 4 Byte(s), TotalSize: 7372800 Byte(s)
 - input_8, Input, Shape: (64, 128, 83), ElemSize: 4 Byte(s), TotalSize: 2719744 Byte(s)
 - GpuReshape{2}.0, Shape: (8192, 83), ElemSize: 4 Byte(s), TotalSize: 2719744 Byte(s)
...