Python 仅Cuda非Windows平台支持调用GPU asm编译。依靠驱动程序进行ptx编译_Python_Tensorflow_Gpu

Python 仅Cuda非Windows平台支持调用GPU asm编译。依靠驱动程序进行ptx编译

python tensorflow

Python 仅Cuda非Windows平台支持调用GPU asm编译。依靠驱动程序进行ptx编译,python,tensorflow,gpu,Python,Tensorflow,Gpu,我试图在一个简单的MNIST模型上使用TensorFlow 2.3.0的GPU。我已经安装了CUDA 10.1和cuDNN 7.6.5。虽然打开任务管理器让人觉得GPU根本没有被使用，这表明GPU可能会快得多，但它似乎是可行的（该模型比以前快，每一个历元2秒）。我已经看到了关于这个警告的其他问题，尽管答案都指向数据生成器的使用，而我并没有使用它。我尝试了这里评论中提到的解决方案：尽管没有帮助。我的jupyter笔记本输出如下： [I 17:06:09.421 NotebookApp] Jupyt

我试图在一个简单的MNIST模型上使用TensorFlow 2.3.0的GPU。我已经安装了CUDA 10.1和cuDNN 7.6.5。虽然打开任务管理器让人觉得GPU根本没有被使用，这表明GPU可能会快得多，但它似乎是可行的（该模型比以前快，每一个历元2秒）。我已经看到了关于这个警告的其他问题，尽管答案都指向数据生成器的使用，而我并没有使用它。我尝试了这里评论中提到的解决方案：尽管没有帮助。我的jupyter笔记本输出如下：

[I 17:06:09.421 NotebookApp] JupyterLab extension loaded from C:\Users\jsmith\Anaconda3\lib\site-packages\jupyterlab
[I 17:06:09.421 NotebookApp] JupyterLab application directory is C:\Users\jsmith\Anaconda3\share\jupyter\lab
[I 17:06:09.423 NotebookApp] Serving notebooks from local directory: C:\Users\jsmith
[I 17:06:09.424 NotebookApp] The Jupyter Notebook is running at:
[I 17:06:09.424 NotebookApp] http://localhost:8888/?token=4da96629c2f5d3e118e50a083d16b21990572a21f3bf04ad
[I 17:06:09.424 NotebookApp]  or http://127.0.0.1:8888/?token=4da96629c2f5d3e118e50a083d16b21990572a21f3bf04ad
[I 17:06:09.424 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 17:06:09.460 NotebookApp]

    To access the notebook, open this file in a browser:
        file:///C:/Users/jsmith/AppData/Roaming/jupyter/runtime/nbserver-13024-open.html
    Or copy and paste one of these URLs:
        http://localhost:8888/?token=4da96629c2f5d3e118e50a083d16b21990572a21f3bf04ad
     or http://127.0.0.1:8888/?token=4da96629c2f5d3e118e50a083d16b21990572a21f3bf04ad
[I 17:06:17.565 NotebookApp] Kernel started: 02385ecf-f682-496e-a056-9442356a7642
[I 17:06:30.004 NotebookApp] Starting buffering for 02385ecf-f682-496e-a056-9442356a7642:10507d95e443431392e5aa3a711b2952
[I 17:06:30.237 NotebookApp] Kernel restarted: 02385ecf-f682-496e-a056-9442356a7642
[I 17:06:30.824 NotebookApp] Restoring connection for 02385ecf-f682-496e-a056-9442356a7642:10507d95e443431392e5aa3a711b2952
[I 17:06:30.824 NotebookApp] Replaying 3 buffered messages
2021-01-11 17:06:31.365107: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-01-11 17:06:33.450527: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2021-01-11 17:06:33.480765: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1650 Ti computeCapability: 7.5
coreClock: 1.485GHz coreCount: 16 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 178.84GiB/s
2021-01-11 17:06:33.480922: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-01-11 17:06:33.485959: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-01-11 17:06:33.489200: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-01-11 17:06:33.490830: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-01-11 17:06:33.494440: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-01-11 17:06:33.496906: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-01-11 17:06:33.510665: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-01-11 17:06:33.510964: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-01-11 17:06:33.511780: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-11 17:06:33.520884: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1ac3b7d9710 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-01-11 17:06:33.520974: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-01-11 17:06:33.521624: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1650 Ti computeCapability: 7.5
coreClock: 1.485GHz coreCount: 16 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 178.84GiB/s
2021-01-11 17:06:33.522001: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-01-11 17:06:33.522330: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-01-11 17:06:33.522841: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-01-11 17:06:33.523498: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-01-11 17:06:33.523780: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-01-11 17:06:33.525973: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-01-11 17:06:33.526177: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-01-11 17:06:33.527458: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-01-11 17:06:34.029958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-11 17:06:34.030108: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0
2021-01-11 17:06:34.031482: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N
2021-01-11 17:06:34.032610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2905 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1650 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2021-01-11 17:06:34.037750: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1ac659c5460 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-01-11 17:06:34.037828: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1650 Ti, Compute Capability 7.5
2021-01-11 17:06:34.213785: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1650 Ti computeCapability: 7.5
coreClock: 1.485GHz coreCount: 16 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 178.84GiB/s
2021-01-11 17:06:34.214061: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-01-11 17:06:34.215834: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-01-11 17:06:34.219921: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-01-11 17:06:34.220345: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-01-11 17:06:34.220737: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-01-11 17:06:34.221081: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-01-11 17:06:34.221561: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-01-11 17:06:34.221816: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-01-11 17:06:34.222153: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-11 17:06:34.222309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0
2021-01-11 17:06:34.222660: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N
2021-01-11 17:06:34.222923: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2905 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1650 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2021-01-11 17:06:35.001255: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-01-11 17:06:35.220686: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-01-11 17:06:36.204198: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.

这是我获取数据的代码：

num_classes = 10
input_shape = (28, 28, 1)

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Scale images to the [0, 1] range
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255
# Make sure images have shape (28, 28, 1)
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
print("x_train shape:", x_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")


# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

这是我的模型：

    from tensorflow.keras import layers        
    model = keras.Sequential(
                [
                    keras.Input(shape=input_shape),
                    layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
                    layers.MaxPooling2D(pool_size=(2, 2)),
                    layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
                    layers.MaxPooling2D(pool_size=(2, 2)),
                    layers.Flatten(),
                    layers.Dropout(0.5),
                    layers.Dense(num_classes, activation="softmax"),
                ]
            )
        batch_size = 128
        epochs = 60
        
        model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
        
        batch_size = 128
epochs = 60

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
with tf.device('/GPU:1'):
    model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)

这是任务管理器。您可以看到，几乎所有的GPU内存都在使用，然而4%的内存在使用，而45%的CPU在使用。

在该页面上选择视频编码旁边的下拉列表，并将其更改为CUDA。然后，您将看到Tensorflow的GPU活动。这对我来说也不明显，但基本上你只是在看GPU活动的错误部分。

我们不应该使用

任务管理器来检查GPU
是否被Tensorflow
使用。有关更多详细信息，请参阅。谢谢@TFer2谢谢，原来tf.test.是用_cuda（）构建的。
返回True
，这意味着它在GPU上运行。不过，您是否知道是什么原因导致了此警告（调用GPU asm编译仅在依赖驱动程序执行ptx编译的Cuda非Windows平台上受支持。
），谢谢您提供的信息