Python 3.x 移动到Tensorflow 2.0,第三步后培训将暂停
最近,我决定从Tensorflow的1.14版(gpu变体)升级到当前的2.0版 我当前的设置是:Python 3.x 移动到Tensorflow 2.0,第三步后培训将暂停,python-3.x,tensorflow,tensorflow2.0,Python 3.x,Tensorflow,Tensorflow2.0,最近,我决定从Tensorflow的1.14版(gpu变体)升级到当前的2.0版 我当前的设置是: Tensorflow(gpu变体)2.0 Cudnn 7.6.4 CUDA 10 Python 3.6 IDE:VisualStudio2019 我原以为会有一些痛苦,但这让我措手不及 当我试着运行我的(现在调整的)1.14项目之一时,使用now issue构建模型,培训过程顺利开始。第三步后才完全停止。 同样的项目在Tensorflow 2.0的cpu变体上运行良好,但训练所有模型需要几个数
- Tensorflow(gpu变体)2.0
- Cudnn 7.6.4
- CUDA 10
- Python 3.6
- IDE:VisualStudio2019
- 改变超参数
- 重新安装CUDA
- 重新安装tensorflow
- 重新安装cudnn
- 禁用验证
- 检查路径变量
Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.
我从来没有得到过TF1.14,我有点困惑。
我知道CUDA工作是因为我编译和运行了英伟达的几个例子。因此,剩下的唯一实物期权与Tensorflow或其如何处理GPU有关
但我不知道如何前进
会议记录如下:
019-11-27 01:03:57.910895: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\pandas\core\frame.py:4117: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
errors=errors,
2019-11-27 01:04:02.247959: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2019-11-27 01:04:02.277414: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.835
pciBusID: 0000:0a:00.0
2019-11-27 01:04:02.282378: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-11-27 01:04:02.286653: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-11-27 01:04:02.289629: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-11-27 01:04:02.295084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.835
pciBusID: 0000:0a:00.0
2019-11-27 01:04:02.299843: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-11-27 01:04:02.303965: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-11-27 01:04:03.043700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-27 01:04:03.047132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2019-11-27 01:04:03.049453: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2019-11-27 01:04:03.052642: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6382 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:0a:00.0, compute capability: 6.1)
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (None, 154, 64) 896000
_________________________________________________________________
conv1d (Conv1D) (None, 150, 64) 20544
_________________________________________________________________
flatten (Flatten) (None, 9600) 0
_________________________________________________________________
dense (Dense) (None, 300) 2880300
_________________________________________________________________
dense_1 (Dense) (None, 150) 45150
_________________________________________________________________
dense_2 (Dense) (None, 70) 10570
_________________________________________________________________
dense_3 (Dense) (None, 10) 710
_________________________________________________________________
dense_4 (Dense) (None, 2) 22
=================================================================
Total params: 3,853,296
Trainable params: 3,853,296
Non-trainable params: 0
_________________________________________________________________
Train for 10 steps, validate for 50 steps
Epoch 1/40
2019-11-27 01:04:06.199581: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2019-11-27 01:04:06.430358: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2019-11-27 01:04:07.180709: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.
2019-11-27 01:04:07.425377: I tensorflow/core/profiler/lib/profiler_session.cc:184] Profiler session started.
2019-11-27 01:04:07.431736: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cupti64_100.dll
1/10 [==>...........................] - ETA: 32s - loss: 0.6933 - accuracy: 0.4375 - categorical_accuracy: 0.4375 - precision: 0.4375 - recall: 0.43752019-11-27 01:04:07.655586: I tensorflow/core/platform/default/device_tracer.cc:588] Collecting 148 kernel records, 21 memcpy records.
WARNING: Logging before flag parsing goes to stderr.
W1127 01:04:07.730274 5696 callbacks.py:244] Method (on_train_batch_end) is slow compared to the batch update (0.138531). Check your callbacks.
3/10 [========>.....................] - ETA: 9s - loss: 0.6167 - accuracy: 0.7000 - categorical_accuracy: 0.7000 - precision: 0.7000 - recall: 0.7000
我也受到同样问题的影响。在我的情况下,问题出在司机身上 我首先用CUDA 10和最新的NVIDIA驱动程序尝试了tensorflow gpu,然后在训练阶段随机卡住,结果看到了你正在展示的ptxas 接下来,我将tensorflow版本从2.0改为1.15或1.14,使用Python版本进行调整,看不到任何帮助
卸载驱动程序并安装一个旧的驱动程序(432.00)后,问题就消失了,尽管我仍然看到ptxas警告。Linux上似乎有很大的支持。目前还没有解决办法。