Python 轮询事件状态时出错:未能查询事件:CUDA\u错误\u启动\u失败:未指定的启动失败

Python 轮询事件状态时出错:未能查询事件:CUDA\u错误\u启动\u失败:未指定的启动失败,python,tensorflow,gradienttape,Python,Tensorflow,Gradienttape,我已经为这个问题苦苦挣扎了五天,读了StackOverflow的文章,但仍然无法获得解决这个问题的清晰线索。只是建议您尝试不同的NVIDIA驱动程序版本,直到您找到一个与CUDA版本(主要是10.1)匹配的GPU卡 我在一个桌面(windows 10,64位操作系统)上有一个NVIDIA GeForce GTX 1015 Ti,在另一个桌面(windows 10,64位系统)上有一个NVIDIA GeForce RTX 2080Ti。我按照上的硬件要求安装GPU驱动程序(1050 Ti GPU的

我已经为这个问题苦苦挣扎了五天,读了StackOverflow的文章,但仍然无法获得解决这个问题的清晰线索。只是建议您尝试不同的NVIDIA驱动程序版本,直到您找到一个与CUDA版本(主要是10.1)匹配的GPU卡

我在一个桌面(windows 10,64位操作系统)上有一个NVIDIA GeForce GTX 1015 Ti,在另一个桌面(windows 10,64位系统)上有一个NVIDIA GeForce RTX 2080Ti。我按照上的硬件要求安装GPU驱动程序(1050 Ti GPU的试用版418.81和457.09,2080 Ti GPU的试用版432.00和457.30)、CUDA Toolkit(10.1适用于两台台式机)和cuDNN(7.6.0适用于两台台式机),并最终修改了
PATH
环境变量。TensorFlow版本是2.3.0,Python版本是3.7.9

使用TensorFlow网站提供的MNIST培训数据集,这非常有趣。但当我运行一些自定义代码时,两台电脑都会出现以下错误(我有一个从Keras.model继承的自定义模型):

我不是在传统的神经网络训练中使用TensorFlow,而是在优化问题中利用自动微分机制

我认为我的自定义代码没有问题,因为它在googlecolab上运行良好。同样的代码在我朋友的Linux系统上运行良好

重现错误的代码(在Google Colab上运行没有问题):

谁能告诉我如何解决这个问题?盲目尝试不同版本的驱动程序是唯一可行的方法吗


奇怪的是,如果我在PC上用Keras API运行神经网络训练,就不会有这样的错误。如果我用
梯度带
编写一些非常简单的代码来计算梯度,也不会有错误。。。这样看来,驱动程序的安装似乎是正确的……真是令人困惑

检查您正在运行的tensorflow版本,以及它需要的驱动程序,您必须找到tensorflow版本和您的应用程序的兼容版本hardware@SajanGohil我遵循的是CUDA10.1需要418.x或更高版本的驱动程序。但问题是有太多的驱动程序版本,根据其他人的经验,只有特定的版本才能与特定的gpu兼容。我不知道该为我的GPU选择哪一个。是的,一个特定的GPU将有一些特定的驱动程序支持它。你必须找到tensorflow、nvidia图形卡驱动程序、CUDA版本、Cudnn版本等的最佳选择。另外,你使用了多少数据?@SajanGohil你好,我编辑了这篇文章并给出了完整的代码。请在本地计算机上运行它,看看是否会出现相同的错误。谢谢~对不起,我不能在本地运行它(我没有nvidia gpu),但你是对的,因为它运行的代码很可能是正确的。问题可能是在windows上安装tf/驱动程序等。如果您只是想要autodiff,我认为如果您无法解决这个问题,可以使用JAX()作为另一个选项
# -*- coding: utf-8 -*-
## This code runs well in the Google Colab GPU runtime
## Yuanhang Zhang & Zheyuan Zhu, 12/1/2020, CREOL, UCF, Copyright reserved
## please contact yuanhangzhang@knights.ucf.edu if you want to use the code for research or publications
## all length units are in mm

import tensorflow as tf
import numpy as np
print('tensorflow version:',tf.__version__)

#%% ASM method
dx=np.float32(5e-3) # pixel size
N_obj= 64 # 512 

def tf_fft2d(x):
    with tf.name_scope('tf_fft2d'): # add name_scope, check in tensorboard
      x_shift = tf.signal.ifftshift(x)
      x_fft=tf.signal.fft2d(x_shift)
      y = tf.signal.fftshift(x_fft)
      return y

def tf_ifft2d(x):
    with tf.name_scope('tf_ifft2d'):
      x_shift = tf.signal.ifftshift(x)
      x_ifft=tf.signal.ifft2d(x_shift)
      y = tf.signal.fftshift(x_ifft)
      return y

# angular spectrum method (ASM), not band-limited
# @tf.function
def prop_ASM(Ein,z,wavelength,N_obj,dx):
    freq_obj = np.arange(-N_obj//2,N_obj//2,1)*(1/(dx*N_obj))
    kx = 2*np.pi*freq_obj
    ky = kx.copy()
    KX,KY = np.meshgrid(kx,ky)
    k0 = 2*np.pi/wavelength
    KZ_square = k0**2-KX**2-KY**2
    KZ_square[KZ_square<0] = 0
    Q = np.exp(-1j*z*np.sqrt(KZ_square)) # transfer function of freespace
    with tf.name_scope('prop_ASM'):
      FFT_obj = tf_fft2d(Ein)
      Q_tf = tf.constant(Q,dtype=tf.complex64)
      Eout = tf_ifft2d(FFT_obj*Q_tf)
      return Eout

print('N_obj:',N_obj)

import matplotlib.pyplot as plt
import shutil
shutil.rmtree('__pycache__',ignore_errors=True) # Delete an entire directory tree
import os
os.environ["CUDA_VISIBLE_DEVICES"]='0' 

save_model_path='./models' 
save_mat_folder='./results' 
log_path='./tensorboard_log' # path to log training process
load_model_path = save_model_path

#%% inputs/ouputs for the optimization
x = (np.arange(N_obj,dtype = np.float32)-N_obj/2)*dx
y = (np.arange(N_obj,dtype = np.float32)-N_obj/2)*dx
x_c, y_c = np.meshgrid(x,y)

# input: Gaussian mode
e_in = np.zeros((N_obj, N_obj),dtype = np.float32)  # initialize input field
w_in = np.float32(5e-2)   # beam width

e = np.exp(-((x_c)**2+(y_c)**2)/w_in**2) # Gaussian beam spots array
I = np.sum(np.abs(e)**2)
e_in = e/np.sqrt(I) # normalize power

fig, ax = plt.subplots()
im=ax.imshow(e_in)
cbar=plt.colorbar(im)  
print('e_in shape:',e_in.shape)

# output: Hermite mode
e_out = np.zeros((N_obj, N_obj),dtype = np.float32)
w_out = np.float32(5e-2) # 30e-2
c = np.array([[0,0],[0,1]])
e = np.polynomial.hermite.hermgrid2d(np.sqrt(2)*x/w_out, np.sqrt(2)*y/w_out, c)*np.exp(-(x_c**2+y_c**2)/w_out**2)
e = np.float32(e)
I = np.sum(np.abs(e)**2)
e_out = e/np.sqrt(I) # power normalized

fig, ax = plt.subplots()
im=ax.imshow(e_out)
cbar=plt.colorbar(im)

print('e_out shape:',e_out.shape)

#%% optimization by GradientTape
z = 20 # propagating distance
lambda_design_list = np.array([1.550e-3],dtype = np.float32)

Ein = tf.constant(e_in, name = 'Ein', dtype = tf.complex64) # a 2D tensor
Eout = tf.constant(e_out, name = 'Eout', dtype = tf.complex64)

phi1 = tf.Variable(np.float32(np.ones((N_obj,N_obj))),name='phi1') # dtype: float32
phi2 = tf.Variable(np.float32(np.ones((N_obj,N_obj))),name='phi2')


def forward_propagate(Ein,z,lambda_design_list,N_obj,dx):
    E1_1 = prop_ASM(Ein,z,lambda_design_list[0],N_obj,dx) # used tf.signal.fft2d
    E1_mod_1 = E1_1*tf.exp(tf.complex(real=tf.zeros_like(phi1,dtype='float32'),imag=phi1))
    # E1_mod_1 = tf.math.multiply(E1_1,tf.exp(1j*phi1)) # element-wise muliply ?? not working !!
    E2_1 = prop_ASM(E1_mod_1,z,lambda_design_list[0],N_obj,dx)
    E2_mod_1 = E2_1*tf.exp(tf.complex(real=tf.zeros_like(phi2,dtype='float32'),imag=phi2)) 
    E_out = prop_ASM(E2_mod_1,z,lambda_design_list[0],N_obj,dx)
    # E_out = tf.math.multiply(E2_1,tf.exp(1j*phi2))
    return E_out

def loss_single(E_out, Eout): 
    coupling_eff = tf.sqrt(
        (tf.square(tf.reduce_sum(tf.math.real(E_out)*tf.math.real(Eout)+tf.math.imag(E_out)*tf.math.imag(Eout))) +
         tf.square(tf.reduce_sum(tf.math.imag(E_out)*tf.math.real(Eout)-tf.math.real(E_out)*tf.math.imag(Eout))) ))
    # or something simpler:
    # coupling_eff = tf.abs(tf.reduce_sum((tf.math.multiply(E_out,Eout))))
    loss = - coupling_eff
    return loss

variables = [phi1, phi2] # write variables in a list to optimize

# define optimizer
optimizer =  tf.keras.optimizers.Adam(learning_rate= 1e-2)
epoch_num = 20

for ii in tf.range(epoch_num):
  with tf.GradientTape() as tape:
    # this forward_propagate() function must be in the tape context! otherwise grads is None !!
    # the tape need to record the complete forward propagation 
    E_out = forward_propagate(Ein,z,lambda_design_list,N_obj,dx) 
    loss = loss_single(E_out, Eout)  
    tf.print('ii =:',ii,'coupling_eff =:',-loss)
    # print('watched variables in tape:',[var.name for var in tape.watched_variables()])

  # print("\n ===== calculate gradients now ====ERROR in NEXT LINE!!======\n\n")
  grads = tape.gradient(loss, variables) ## auto-differentiation
  # print(grads)

  # TensorFlow will update parameters automatically
  optimizer.apply_gradients(grads_and_vars=zip(grads, variables))
2020-11-29 20:41:57.457271: E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2020-11-29 20:41:57.457480: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:220] Unexpected Event status: 1
[I 20:42:05.512 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports