Python 3.x 移动到Tensorflow 2.0，第三步后培训将暂停_Python 3.x_Tensorflow_Tensorflow2.0

Python 3.x 移动到Tensorflow 2.0，第三步后培训将暂停

python-3.x tensorflow

Python 3.x 移动到Tensorflow 2.0，第三步后培训将暂停,python-3.x,tensorflow,tensorflow2.0,Python 3.x,Tensorflow,Tensorflow2.0,最近，我决定从Tensorflow的1.14版（gpu变体）升级到当前的2.0版我当前的设置是： Tensorflow（gpu变体）2.0 Cudnn 7.6.4 CUDA 10 Python 3.6 IDE:VisualStudio2019 我原以为会有一些痛苦，但这让我措手不及当我试着运行我的（现在调整的）1.14项目之一时，使用now issue构建模型，培训过程顺利开始。第三步后才完全停止。同样的项目在Tensorflow 2.0的cpu变体上运行良好，但训练所有模型需要几个数

最近，我决定从Tensorflow的1.14版（gpu变体）升级到当前的2.0版

我当前的设置是：

Tensorflow（gpu变体）2.0
Cudnn 7.6.4
CUDA 10
Python 3.6
IDE:VisualStudio2019

我原以为会有一些痛苦，但这让我措手不及

当我试着运行我的（现在调整的）1.14项目之一时，使用now issue构建模型，培训过程顺利开始。第三步后才完全停止。同样的项目在Tensorflow 2.0的cpu变体上运行良好，但训练所有模型需要几个数量级的时间

以下是我迄今为止所做的尝试：

改变超参数
重新安装CUDA
重新安装tensorflow
重新安装cudnn
禁用验证
检查路径变量

这些措施都无助于解决这一问题。我唯一的线索是警告信息：

 Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.

我从来没有得到过TF1.14，我有点困惑。我知道CUDA工作是因为我编译和运行了英伟达的几个例子。因此，剩下的唯一实物期权与Tensorflow或其如何处理GPU有关

但我不知道如何前进

会议记录如下：

019-11-27 01:03:57.910895: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\pandas\core\frame.py:4117: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,
2019-11-27 01:04:02.247959: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2019-11-27 01:04:02.277414: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.835
pciBusID: 0000:0a:00.0
2019-11-27 01:04:02.282378: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-11-27 01:04:02.286653: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-11-27 01:04:02.289629: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-11-27 01:04:02.295084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.835
pciBusID: 0000:0a:00.0
2019-11-27 01:04:02.299843: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-11-27 01:04:02.303965: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-11-27 01:04:03.043700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-27 01:04:03.047132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2019-11-27 01:04:03.049453: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2019-11-27 01:04:03.052642: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6382 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:0a:00.0, compute capability: 6.1)
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
embedding (Embedding)        (None, 154, 64)           896000
_________________________________________________________________
conv1d (Conv1D)              (None, 150, 64)           20544
_________________________________________________________________
flatten (Flatten)            (None, 9600)              0
_________________________________________________________________
dense (Dense)                (None, 300)               2880300
_________________________________________________________________
dense_1 (Dense)              (None, 150)               45150
_________________________________________________________________
dense_2 (Dense)              (None, 70)                10570
_________________________________________________________________
dense_3 (Dense)              (None, 10)                710
_________________________________________________________________
dense_4 (Dense)              (None, 2)                 22
=================================================================
Total params: 3,853,296
Trainable params: 3,853,296
Non-trainable params: 0
_________________________________________________________________
Train for 10 steps, validate for 50 steps
Epoch 1/40
2019-11-27 01:04:06.199581: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2019-11-27 01:04:06.430358: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2019-11-27 01:04:07.180709: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.
2019-11-27 01:04:07.425377: I tensorflow/core/profiler/lib/profiler_session.cc:184] Profiler session started.
2019-11-27 01:04:07.431736: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cupti64_100.dll
 1/10 [==>...........................] - ETA: 32s - loss: 0.6933 - accuracy: 0.4375 - categorical_accuracy: 0.4375 - precision: 0.4375 - recall: 0.43752019-11-27 01:04:07.655586: I tensorflow/core/platform/default/device_tracer.cc:588] Collecting 148 kernel records, 21 memcpy records.
WARNING: Logging before flag parsing goes to stderr.
W1127 01:04:07.730274  5696 callbacks.py:244] Method (on_train_batch_end) is slow compared to the batch update (0.138531). Check your callbacks.
 3/10 [========>.....................] - ETA: 9s - loss: 0.6167 - accuracy: 0.7000 - categorical_accuracy: 0.7000 - precision: 0.7000 - recall: 0.7000

我也受到同样问题的影响。在我的情况下，问题出在司机身上

我首先用CUDA 10和最新的NVIDIA驱动程序尝试了tensorflow gpu，然后在训练阶段随机卡住，结果看到了你正在展示的ptxas

接下来，我将tensorflow版本从2.0改为1.15或1.14，使用Python版本进行调整，看不到任何帮助

卸载驱动程序并安装一个旧的驱动程序（432.00）后，问题就消失了，尽管我仍然看到ptxas警告。

Linux上似乎有很大的支持。目前还没有解决办法。