Python 使用GPU和CUDA、cuDNN、Anaconda、RTX 3060 Ti运行Tensorflow/Keras
我第一次尝试使用我的新RTX 3060 Ti来训练神经网络,但遇到了一个困难的错误。以下是错误消息:Python 使用GPU和CUDA、cuDNN、Anaconda、RTX 3060 Ti运行Tensorflow/Keras,python,machine-learning,neural-network,gpu,data-science,Python,Machine Learning,Neural Network,Gpu,Data Science,我第一次尝试使用我的新RTX 3060 Ti来训练神经网络,但遇到了一个困难的错误。以下是错误消息: 2020-12-17 12:45:09.600373: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED Traceback (most recent call last): File "marge.py&q
2020-12-17 12:45:09.600373: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
File "marge.py", line 365, in <module>
MARGE(*sys.argv[1:])
File "marge.py", line 357, in MARGE
filters, filt2um)
File "lib\NN.py", line 712, in driver
nn.train(train_batches, valid_batches, epochs, patience)
File "lib\NN.py", line 335, in train
model_checkpoint])
File "C:\Users\Nick\anaconda3\envs\marge\lib\site-packages\keras\engine\training.py", line 1039, in fit
validation_steps=validation_steps)
File "C:\Users\Nick\anaconda3\envs\marge\lib\site-packages\keras\engine\training_arrays.py", line 154, in fit_loop
outs = f(ins)
File "C:\Users\Nick\anaconda3\envs\marge\lib\site-packages\keras\backend\tensorflow_backend.py", line 2715, in __call__
return self._call(inputs)
File "C:\Users\Nick\anaconda3\envs\marge\lib\site-packages\keras\backend\tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "C:\Users\Nick\anaconda3\envs\marge\lib\site-packages\tensorflow\python\client\session.py", line 1458, in __call__
run_metadata_ptr)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas GEMM launch failed : a.shape=(256, 7), b.shape=(7, 4096), m=256, n=4096, k=7
[[{{node dense_1/MatMul}}]]
[[loss/mul/_125]]
(1) Internal: Blas GEMM launch failed : a.shape=(256, 7), b.shape=(7, 4096), m=256, n=4096, k=7
[[{{node dense_1/MatMul}}]]
0 successful operations.
0 derived errors ignored.
我尝试过的事情:
-
<> LI> < P>安装Novidia版本的TysFooSoad,但命令<代码> PIP安装-用户英伟达PyCurdie和
- 将以下内容添加到我的代码顶部: config=tf.ConfigProto() config.gpu\u options.allow\u growth=True session=tf.session(config=config)
2020-12-17 12:27:24.445007: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce RTX 3060 Ti major: 8 minor: 6 memoryClockRate(GHz): 1.71
pciBusID: 0000:06:00.0
2020-12-17 12:27:24.445413: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-12-17 12:27:24.445639: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2020-12-17 12:27:24.445772: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-12-17 12:27:24.445881: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2020-12-17 12:27:24.445971: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2020-12-17 12:27:24.446221: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6712 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3060 Ti, pci bus id: 0000:06:00.0, compute capability: 8.6)
RTX 3060 Ti是否可能不支持此用途
请让我知道,如果有任何额外的信息,我可以提供。提前感谢您的帮助
编辑:
我还尝试了的建议(回想起来,安装CUDA和cuDNN似乎很重要)。我还运行了以下命令:
tf.test.is_built_with_cuda()
tf.test.is_gpu_available(cuda_only=False, min_cuda_compute_capability=None)
两者都返回“True”。但是,我仍然收到“未能运行cuBLAS例程”错误。我还注意到,3060 Ti没有出现在与CUDA兼容的GPU的NVIDEA上,所以可能我只是运气不好…我也遇到了一些困难,就像你在我的GPU上运行时遇到的一样。我记不太清楚,所以我恐怕帮不了什么忙 我看到您正在使用下面的TensorFlow的第二个版本。您必须为正在使用的TensorFlow 1.14安装正确的Cuda和cuDNN版本 以下是您应该尝试安装的所有链接:
- GPU的驱动程序:
- 正确的CUDA版本:
- 正确的cuDNN版本(您需要创建一个帐户,但下载7.4版本):
我希望这能起作用,如果能起作用,请告诉我,祝你好运 谢谢你的回复。我开始认为我的问题可能与我安装的软件版本有关。我在别处读到()我需要安装TF2.4,所以我已经这样做了。从reddit线程来看,TF 2.4/3000系列卡需要CUDA 11.0,而我安装了CUDA 11.1。下一步我将尝试降级我的CUDA安装。
tf.test.is_built_with_cuda()
tf.test.is_gpu_available(cuda_only=False, min_cuda_compute_capability=None)