Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 移动到Tensorflow 2.0,第三步后培训将暂停_Python 3.x_Tensorflow_Tensorflow2.0 - Fatal编程技术网

Python 3.x 移动到Tensorflow 2.0,第三步后培训将暂停

Python 3.x 移动到Tensorflow 2.0,第三步后培训将暂停,python-3.x,tensorflow,tensorflow2.0,Python 3.x,Tensorflow,Tensorflow2.0,最近,我决定从Tensorflow的1.14版(gpu变体)升级到当前的2.0版 我当前的设置是: Tensorflow(gpu变体)2.0 Cudnn 7.6.4 CUDA 10 Python 3.6 IDE:VisualStudio2019 我原以为会有一些痛苦,但这让我措手不及 当我试着运行我的(现在调整的)1.14项目之一时,使用now issue构建模型,培训过程顺利开始。第三步后才完全停止。 同样的项目在Tensorflow 2.0的cpu变体上运行良好,但训练所有模型需要几个数

最近,我决定从Tensorflow的1.14版(gpu变体)升级到当前的2.0版

我当前的设置是:

  • Tensorflow(gpu变体)2.0
  • Cudnn 7.6.4
  • CUDA 10
  • Python 3.6
  • IDE:VisualStudio2019
我原以为会有一些痛苦,但这让我措手不及

当我试着运行我的(现在调整的)1.14项目之一时,使用now issue构建模型,培训过程顺利开始。第三步后才完全停止。 同样的项目在Tensorflow 2.0的cpu变体上运行良好,但训练所有模型需要几个数量级的时间

以下是我迄今为止所做的尝试:

  • 改变超参数
  • 重新安装CUDA
  • 重新安装tensorflow
  • 重新安装cudnn
  • 禁用验证
  • 检查路径变量
这些措施都无助于解决这一问题。我唯一的线索是警告信息:

 Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.
我从来没有得到过TF1.14,我有点困惑。 我知道CUDA工作是因为我编译和运行了英伟达的几个例子。因此,剩下的唯一实物期权与Tensorflow或其如何处理GPU有关

但我不知道如何前进

会议记录如下:

019-11-27 01:03:57.910895: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\pandas\core\frame.py:4117: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,
2019-11-27 01:04:02.247959: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2019-11-27 01:04:02.277414: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.835
pciBusID: 0000:0a:00.0
2019-11-27 01:04:02.282378: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-11-27 01:04:02.286653: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-11-27 01:04:02.289629: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-11-27 01:04:02.295084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.835
pciBusID: 0000:0a:00.0
2019-11-27 01:04:02.299843: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-11-27 01:04:02.303965: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-11-27 01:04:03.043700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-27 01:04:03.047132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2019-11-27 01:04:03.049453: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2019-11-27 01:04:03.052642: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6382 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:0a:00.0, compute capability: 6.1)
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
embedding (Embedding)        (None, 154, 64)           896000
_________________________________________________________________
conv1d (Conv1D)              (None, 150, 64)           20544
_________________________________________________________________
flatten (Flatten)            (None, 9600)              0
_________________________________________________________________
dense (Dense)                (None, 300)               2880300
_________________________________________________________________
dense_1 (Dense)              (None, 150)               45150
_________________________________________________________________
dense_2 (Dense)              (None, 70)                10570
_________________________________________________________________
dense_3 (Dense)              (None, 10)                710
_________________________________________________________________
dense_4 (Dense)              (None, 2)                 22
=================================================================
Total params: 3,853,296
Trainable params: 3,853,296
Non-trainable params: 0
_________________________________________________________________
Train for 10 steps, validate for 50 steps
Epoch 1/40
2019-11-27 01:04:06.199581: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2019-11-27 01:04:06.430358: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2019-11-27 01:04:07.180709: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.
2019-11-27 01:04:07.425377: I tensorflow/core/profiler/lib/profiler_session.cc:184] Profiler session started.
2019-11-27 01:04:07.431736: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cupti64_100.dll
 1/10 [==>...........................] - ETA: 32s - loss: 0.6933 - accuracy: 0.4375 - categorical_accuracy: 0.4375 - precision: 0.4375 - recall: 0.43752019-11-27 01:04:07.655586: I tensorflow/core/platform/default/device_tracer.cc:588] Collecting 148 kernel records, 21 memcpy records.
WARNING: Logging before flag parsing goes to stderr.
W1127 01:04:07.730274  5696 callbacks.py:244] Method (on_train_batch_end) is slow compared to the batch update (0.138531). Check your callbacks.
 3/10 [========>.....................] - ETA: 9s - loss: 0.6167 - accuracy: 0.7000 - categorical_accuracy: 0.7000 - precision: 0.7000 - recall: 0.7000

我也受到同样问题的影响。在我的情况下,问题出在司机身上

我首先用CUDA 10和最新的NVIDIA驱动程序尝试了tensorflow gpu,然后在训练阶段随机卡住,结果看到了你正在展示的ptxas

接下来,我将tensorflow版本从2.0改为1.15或1.14,使用Python版本进行调整,看不到任何帮助


卸载驱动程序并安装一个旧的驱动程序(432.00)后,问题就消失了,尽管我仍然看到ptxas警告。

Linux上似乎有很大的支持。目前还没有解决办法。