Python GANet的安装
我正在尝试使用以下NN: 当我不使用apex时会发生此错误Python GANet的安装,python,neural-network,pytorch,torch,Python,Neural Network,Pytorch,Torch,我正在尝试使用以下NN: 当我不使用apex时会发生此错误 Traceback (most recent call last): File "predict.py", line 187, in <module> test(leftname, rightname, savename) File "predict.py", line 158, in test prediction = model(input1, input
Traceback (most recent call last):
File "predict.py", line 187, in <module>
test(leftname, rightname, savename)
File "predict.py", line 158, in test
prediction = model(input1, input2)
File "/home/master/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/master/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/master/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/master/Work/Project01/GANet/models/GANet_deep.py", line 396, in forward
g = self.conv_start(x)
File "/home/master/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/master/anaconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/master/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/master/Work/Project01/GANet/models/GANet_deep.py", line 42, in forward
x = self.bn(x)
File "/home/master/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/master/Work/Project01/GANet/libs/sync_bn/modules/sync_bn.py", line 121, in forward
self.activation, self.slope).view(input_shape)
File "/home/master/Work/Project01/GANet/libs/sync_bn/functions/sync_bn.py", line 95, in forward
y = sync_bn_gpu.batchnorm_forward(x, _ex, _exs, gamma, beta, ctx.eps)
RuntimeError: cudaGetLastError() == cudaSuccess ASSERT FAILED at src/gpu/sync_bn_cuda.cu:289, please report a bug to PyTorch. (BatchNorm_Forward_CUDA at src/gpu/sync_bn_cuda.cu:289)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fca7ac1bcc5 in /home/master/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: BatchNorm_Forward_CUDA(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, float) + 0x2be (0x7fca76e56ff1 in /home/master/Work/Project01/GANet/libs/sync_bn/build/lib/sync_bn_gpu.cpython-37m-x86_64-linux-gnu.so)
frame #2: <unknown function> + 0x22ab3 (0x7fca76e54ab3 in /home/master/Work/Project01/GANet/libs/sync_bn/build/lib/sync_bn_gpu.cpython-37m-x86_64-linux-gnu.so)
frame #3: <unknown function> + 0x201a5 (0x7fca76e521a5 in /home/master/Work/Project01/GANet/libs/sync_bn/build/lib/sync_bn_gpu.cpython-37m-x86_64-linux-gnu.so)
<omitting python frames>
frame #11: THPFunction_apply(_object*, _object*) + 0x5a1 (0x7fcaab0b4bf1 in /home/master/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
apex(并将所有BatchNormal2D和BatchNormal3D替换为apex.parallel.syncBatchNormal)运行NN,但返回所有零结果矩阵。
有什么建议吗
在ubuntu 18.04上工作
和康达在一起,
python 3.7.4,
火炬1.0.0,
cuda 10.2
gcc 7.5
pip uninstall apex
rm -rf apex
git clone https://github.com/ptrblck/apex.git
cd apex
git checkout apex_no_distributed
pip install -v --no-cache-dir ./