Tensorflow 如何修复非OK状态:GpuLaunchKernel状态:Internal:无效配置参数错误?

Tensorflow 如何修复非OK状态:GpuLaunchKernel状态:Internal:无效配置参数错误?,tensorflow,Tensorflow,我第一次尝试使用Tensorflow 2.3,但它不起作用,所以我降级到Tensorflow 2.2,但它仍然显示相同的错误。我做错了什么?下面是我的tensorflow、CUDA和CUDNN版本 bash-4.2$pip3列表 ... tensorboard 2.2.2 tensorboard-plugin-wit 1.7.0 tensorflow 2.2.0 tensorflow-estimator 2.2.0 ... bash-4.

我第一次尝试使用Tensorflow 2.3,但它不起作用,所以我降级到Tensorflow 2.2,但它仍然显示相同的错误。我做错了什么?下面是我的tensorflow、CUDA和CUDNN版本

bash-4.2$pip3列表

... 

tensorboard            2.2.2
tensorboard-plugin-wit 1.7.0
tensorflow             2.2.0
tensorflow-estimator   2.2.0

...
bash-4.2$cd usr/本地

bash-4.2$ls

bin    build  cuda-10.0  cuda-10.2  cuda-7.0  cuda-8.0  cuda-9.1  etc    include  lib64    LICENSE  python3    sbin   src
boost  cuda   cuda-10.1  cuda-11.0  cuda-7.5  cuda-9.0  cuda-9.2  games  lib      libexec  NOTICE   Readme.md  share
...
libcudnn.so    libcudnn.so.7    libcudnn.so.7.5.0    libcudnn.so.7.6.4    libcudnn.so.7.6.5
...
bash-4.2$cd cuda-10.1/lib64

bash-4.2$ls

bin    build  cuda-10.0  cuda-10.2  cuda-7.0  cuda-8.0  cuda-9.1  etc    include  lib64    LICENSE  python3    sbin   src
boost  cuda   cuda-10.1  cuda-11.0  cuda-7.5  cuda-9.0  cuda-9.2  games  lib      libexec  NOTICE   Readme.md  share
...
libcudnn.so    libcudnn.so.7    libcudnn.so.7.5.0    libcudnn.so.7.6.4    libcudnn.so.7.6.5
...
我需要从_keras.py执行python文件

File /uac/y16/jhchoi6/.tvm_test_data/keras/resnet50_weights.h5 exists, skip.
2020-10-16 13:37:06.154190: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-10-16 13:37:06.263238: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:18:00.0 name: TITAN Xp computeCapability: 6.1
coreClock: 1.582GHz coreCount: 30 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 510.07GiB/s
2020-10-16 13:37:06.264335: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-10-16 13:37:06.270718: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-10-16 13:37:06.276767: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-10-16 13:37:06.277829: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-10-16 13:37:06.284514: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-10-16 13:37:06.287641: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-10-16 13:37:06.298349: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-10-16 13:37:06.301762: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-10-16 13:37:06.302407: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-10-16 13:37:06.325261: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2200000000 Hz
2020-10-16 13:37:06.325619: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5d73580 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-10-16 13:37:06.325678: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-10-16 13:37:06.498867: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5de6040 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-10-16 13:37:06.498934: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): TITAN Xp, Compute Capability 6.1
2020-10-16 13:37:06.500142: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:18:00.0 name: TITAN Xp computeCapability: 6.1
coreClock: 1.582GHz coreCount: 30 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 510.07GiB/s
2020-10-16 13:37:06.500270: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-10-16 13:37:06.500314: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-10-16 13:37:06.500356: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-10-16 13:37:06.500397: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-10-16 13:37:06.500437: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-10-16 13:37:06.500478: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-10-16 13:37:06.500519: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-10-16 13:37:06.502436: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-10-16 13:37:06.503788: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-10-16 13:37:06.503830: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0
2020-10-16 13:37:06.503869: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N
2020-10-16 13:37:06.505899: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11324 MB memory) -> physical GPU (device: 0, name: TITAN Xp, pci bus id: 0000:18:00.0, compute capability: 6.1)
2020-10-16 13:37:07.195397: F ./tensorflow/core/kernels/random_op_gpu.h:232] Non-OK-status: GpuLaunchKernel(FillPhiloxRandomKernelLaunch<Distribution>, num_blocks, block_size, 0, d.stream(), gen, data, size, dist) status: Internal: invalid configuration argument
Aborted (core dumped)
来自_keras.py的文件:

import tvm
from tvm import te
import tvm.relay as relay
from tvm.contrib.download import download_testdata
import keras
import numpy as np

######################################################################
# Load pretrained keras model
# ----------------------------
# We load a pretrained resnet-50 classification model provided by keras.
weights_url = "".join(
    [
        "https://github.com/fchollet/deep-learning-models/releases/",
        "download/v0.2/resnet50_weights_tf_dim_ordering_tf_kernels.h5",
    ]
)
weights_file = "resnet50_weights.h5"
weights_path = download_testdata(weights_url, weights_file, module="keras")
keras_resnet50 = keras.applications.resnet50.ResNet50(
    include_top=True, weights=None, input_shape=(224, 224, 3), classes=1000
)
keras_resnet50.load_weights(weights_path)

######################################################################
# Load a test image
# ------------------
# A single cat dominates the examples!
from PIL import Image
from matplotlib import pyplot as plt
from keras.applications.resnet50 import preprocess_input

img_url = "https://github.com/dmlc/mxnet.js/blob/main/data/cat.png?raw=true"
img_path = download_testdata(img_url, "cat.png", module="data")
img = Image.open(img_path).resize((224, 224))
plt.imshow(img)
plt.show()
# input preprocess
data = np.array(img)[np.newaxis, :].astype("float32")
data = preprocess_input(data).transpose([0, 3, 1, 2])
print("input_1", data.shape)

######################################################################
# Compile the model with Relay
# ----------------------------
# convert the keras model(NHWC layout) to Relay format(NCHW layout).
shape_dict = {"input_1": data.shape}
mod, params = relay.frontend.from_keras(keras_resnet50, shape_dict)
# compile the model
target = "cuda"
ctx = tvm.gpu(0)
with tvm.transform.PassContext(opt_level=3):
    executor = relay.build_module.create_executor("graph", mod, ctx, target)

######################################################################
# Execute on TVM
# ---------------
dtype = "float32"
tvm_out = executor.evaluate()(tvm.nd.array(data.astype(dtype)), **params)
top1_tvm = np.argmax(tvm_out.asnumpy()[0])

#####################################################################
# Look up synset name
# -------------------
# Look up prediction top 1 index in 1000 class synset.
synset_url = "".join(
    [
        "https://gist.githubusercontent.com/zhreshold/",
        "4d0b62f3d01426887599d4f7ede23ee5/raw/",
        "596b27d23537e5a1b5751d2b0481ef172f58b539/",
        "imagenet1000_clsid_to_human.txt",
    ]
)
synset_name = "imagenet1000_clsid_to_human.txt"
synset_path = download_testdata(synset_url, synset_name, module="data")
with open(synset_path) as f:
    synset = eval(f.read())
print("Relay top-1 id: {}, class name: {}".format(top1_tvm, synset[top1_tvm]))
# confirm correctness with keras output
keras_out = keras_resnet50.predict(data.transpose([0, 2, 3, 1]))
top1_keras = np.argmax(keras_out)
print("Keras top-1 id: {}, class name: {}".format(top1_keras, synset[top1_keras]))
来自_keras.py的bash-4.2$python3

File /uac/y16/jhchoi6/.tvm_test_data/keras/resnet50_weights.h5 exists, skip.
2020-10-16 13:37:06.154190: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-10-16 13:37:06.263238: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:18:00.0 name: TITAN Xp computeCapability: 6.1
coreClock: 1.582GHz coreCount: 30 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 510.07GiB/s
2020-10-16 13:37:06.264335: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-10-16 13:37:06.270718: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-10-16 13:37:06.276767: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-10-16 13:37:06.277829: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-10-16 13:37:06.284514: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-10-16 13:37:06.287641: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-10-16 13:37:06.298349: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-10-16 13:37:06.301762: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-10-16 13:37:06.302407: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-10-16 13:37:06.325261: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2200000000 Hz
2020-10-16 13:37:06.325619: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5d73580 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-10-16 13:37:06.325678: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-10-16 13:37:06.498867: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5de6040 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-10-16 13:37:06.498934: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): TITAN Xp, Compute Capability 6.1
2020-10-16 13:37:06.500142: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:18:00.0 name: TITAN Xp computeCapability: 6.1
coreClock: 1.582GHz coreCount: 30 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 510.07GiB/s
2020-10-16 13:37:06.500270: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-10-16 13:37:06.500314: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-10-16 13:37:06.500356: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-10-16 13:37:06.500397: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-10-16 13:37:06.500437: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-10-16 13:37:06.500478: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-10-16 13:37:06.500519: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-10-16 13:37:06.502436: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-10-16 13:37:06.503788: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-10-16 13:37:06.503830: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0
2020-10-16 13:37:06.503869: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N
2020-10-16 13:37:06.505899: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11324 MB memory) -> physical GPU (device: 0, name: TITAN Xp, pci bus id: 0000:18:00.0, compute capability: 6.1)
2020-10-16 13:37:07.195397: F ./tensorflow/core/kernels/random_op_gpu.h:232] Non-OK-status: GpuLaunchKernel(FillPhiloxRandomKernelLaunch<Distribution>, num_blocks, block_size, 0, d.stream(), gen, data, size, dist) status: Internal: invalid configuration argument
Aborted (core dumped)