Python 轮询事件状态时出错：未能查询事件：CUDA\u错误\u启动\u失败：未指定的启动失败_Python_Tensorflow_Gradienttape

Python 轮询事件状态时出错：未能查询事件：CUDA\u错误\u启动\u失败：未指定的启动失败

python tensorflow

Python 轮询事件状态时出错：未能查询事件：CUDA\u错误\u启动\u失败：未指定的启动失败,python,tensorflow,gradienttape,Python,Tensorflow,Gradienttape,我已经为这个问题苦苦挣扎了五天，读了StackOverflow的文章，但仍然无法获得解决这个问题的清晰线索。只是建议您尝试不同的NVIDIA驱动程序版本，直到您找到一个与CUDA版本（主要是10.1）匹配的GPU卡我在一个桌面（windows 10，64位操作系统）上有一个NVIDIA GeForce GTX 1015 Ti，在另一个桌面（windows 10，64位系统）上有一个NVIDIA GeForce RTX 2080Ti。我按照上的硬件要求安装GPU驱动程序（1050 Ti GPU的

我已经为这个问题苦苦挣扎了五天，读了StackOverflow的文章，但仍然无法获得解决这个问题的清晰线索。只是建议您尝试不同的NVIDIA驱动程序版本，直到您找到一个与CUDA版本（主要是10.1）匹配的GPU卡

我在一个桌面（windows 10，64位操作系统）上有一个NVIDIA GeForce GTX 1015 Ti，在另一个桌面（windows 10，64位系统）上有一个NVIDIA GeForce RTX 2080Ti。我按照上的硬件要求安装GPU驱动程序（1050 Ti GPU的试用版418.81和457.09，2080 Ti GPU的试用版432.00和457.30）、CUDA Toolkit（10.1适用于两台台式机）和cuDNN（7.6.0适用于两台台式机），并最终修改了

PATH

环境变量。TensorFlow版本是2.3.0，Python版本是3.7.9

使用TensorFlow网站提供的MNIST培训数据集，这非常有趣。但当我运行一些自定义代码时，两台电脑都会出现以下错误（我有一个从Keras.model继承的自定义模型）：

我不是在传统的神经网络训练中使用TensorFlow，而是在优化问题中利用自动微分机制

我认为我的自定义代码没有问题，因为它在googlecolab上运行良好。同样的代码在我朋友的Linux系统上运行良好

重现错误的代码（在Google Colab上运行没有问题）：

谁能告诉我如何解决这个问题？盲目尝试不同版本的驱动程序是唯一可行的方法吗

奇怪的是，如果我在PC上用Keras API运行神经网络训练，就不会有这样的错误。如果我用
梯度带
编写一些非常简单的代码来计算梯度，也不会有错误。。。这样看来，驱动程序的安装似乎是正确的……真是令人困惑
检查您正在运行的tensorflow版本，以及它需要的驱动程序，您必须找到tensorflow版本和您的应用程序的兼容版本hardware@SajanGohil我遵循的是CUDA10.1需要418.x或更高版本的驱动程序。但问题是有太多的驱动程序版本，根据其他人的经验，只有特定的版本才能与特定的gpu兼容。我不知道该为我的GPU选择哪一个。是的，一个特定的GPU将有一些特定的驱动程序支持它。你必须找到tensorflow、nvidia图形卡驱动程序、CUDA版本、Cudnn版本等的最佳选择。另外，你使用了多少数据？@SajanGohil你好，我编辑了这篇文章并给出了完整的代码。请在本地计算机上运行它，看看是否会出现相同的错误。谢谢~对不起，我不能在本地运行它（我没有nvidia gpu），但你是对的，因为它运行的代码很可能是正确的。问题可能是在windows上安装tf/驱动程序等。如果您只是想要autodiff，我认为如果您无法解决这个问题，可以使用JAX（）作为另一个选项
# -*- coding: utf-8 -*- ## This code runs well in the Google Colab GPU runtime ## Yuanhang Zhang & Zheyuan Zhu, 12/1/2020, CREOL, UCF, Copyright reserved ## please contact yuanhangzhang@knights.ucf.edu if you want to use the code for research or publications ## all length units are in mm import tensorflow as tf import numpy as np print('tensorflow version:',tf.__version__) #%% ASM method dx=np.float32(5e-3) # pixel size N_obj= 64 # 512 def tf_fft2d(x): with tf.name_scope('tf_fft2d'): # add name_scope, check in tensorboard x_shift = tf.signal.ifftshift(x) x_fft=tf.signal.fft2d(x_shift) y = tf.signal.fftshift(x_fft) return y def tf_ifft2d(x): with tf.name_scope('tf_ifft2d'): x_shift = tf.signal.ifftshift(x) x_ifft=tf.signal.ifft2d(x_shift) y = tf.signal.fftshift(x_ifft) return y # angular spectrum method (ASM), not band-limited # @tf.function def prop_ASM(Ein,z,wavelength,N_obj,dx): freq_obj = np.arange(-N_obj//2,N_obj//2,1)*(1/(dx*N_obj)) kx = 2*np.pi*freq_obj ky = kx.copy() KX,KY = np.meshgrid(kx,ky) k0 = 2*np.pi/wavelength KZ_square = k0**2-KX**2-KY**2 KZ_square[KZ_square<0] = 0 Q = np.exp(-1j*z*np.sqrt(KZ_square)) # transfer function of freespace with tf.name_scope('prop_ASM'): FFT_obj = tf_fft2d(Ein) Q_tf = tf.constant(Q,dtype=tf.complex64) Eout = tf_ifft2d(FFT_obj*Q_tf) return Eout print('N_obj:',N_obj) import matplotlib.pyplot as plt import shutil shutil.rmtree('__pycache__',ignore_errors=True) # Delete an entire directory tree import os os.environ["CUDA_VISIBLE_DEVICES"]='0' save_model_path='./models' save_mat_folder='./results' log_path='./tensorboard_log' # path to log training process load_model_path = save_model_path #%% inputs/ouputs for the optimization x = (np.arange(N_obj,dtype = np.float32)-N_obj/2)*dx y = (np.arange(N_obj,dtype = np.float32)-N_obj/2)*dx x_c, y_c = np.meshgrid(x,y) # input: Gaussian mode e_in = np.zeros((N_obj, N_obj),dtype = np.float32) # initialize input field w_in = np.float32(5e-2) # beam width e = np.exp(-((x_c)**2+(y_c)**2)/w_in**2) # Gaussian beam spots array I = np.sum(np.abs(e)**2) e_in = e/np.sqrt(I) # normalize power fig, ax = plt.subplots() im=ax.imshow(e_in) cbar=plt.colorbar(im) print('e_in shape:',e_in.shape) # output: Hermite mode e_out = np.zeros((N_obj, N_obj),dtype = np.float32) w_out = np.float32(5e-2) # 30e-2 c = np.array([[0,0],[0,1]]) e = np.polynomial.hermite.hermgrid2d(np.sqrt(2)*x/w_out, np.sqrt(2)*y/w_out, c)*np.exp(-(x_c**2+y_c**2)/w_out**2) e = np.float32(e) I = np.sum(np.abs(e)**2) e_out = e/np.sqrt(I) # power normalized fig, ax = plt.subplots() im=ax.imshow(e_out) cbar=plt.colorbar(im) print('e_out shape:',e_out.shape) #%% optimization by GradientTape z = 20 # propagating distance lambda_design_list = np.array([1.550e-3],dtype = np.float32) Ein = tf.constant(e_in, name = 'Ein', dtype = tf.complex64) # a 2D tensor Eout = tf.constant(e_out, name = 'Eout', dtype = tf.complex64) phi1 = tf.Variable(np.float32(np.ones((N_obj,N_obj))),name='phi1') # dtype: float32 phi2 = tf.Variable(np.float32(np.ones((N_obj,N_obj))),name='phi2') def forward_propagate(Ein,z,lambda_design_list,N_obj,dx): E1_1 = prop_ASM(Ein,z,lambda_design_list[0],N_obj,dx) # used tf.signal.fft2d E1_mod_1 = E1_1*tf.exp(tf.complex(real=tf.zeros_like(phi1,dtype='float32'),imag=phi1)) # E1_mod_1 = tf.math.multiply(E1_1,tf.exp(1j*phi1)) # element-wise muliply ?? not working !! E2_1 = prop_ASM(E1_mod_1,z,lambda_design_list[0],N_obj,dx) E2_mod_1 = E2_1*tf.exp(tf.complex(real=tf.zeros_like(phi2,dtype='float32'),imag=phi2)) E_out = prop_ASM(E2_mod_1,z,lambda_design_list[0],N_obj,dx) # E_out = tf.math.multiply(E2_1,tf.exp(1j*phi2)) return E_out def loss_single(E_out, Eout): coupling_eff = tf.sqrt( (tf.square(tf.reduce_sum(tf.math.real(E_out)*tf.math.real(Eout)+tf.math.imag(E_out)*tf.math.imag(Eout))) + tf.square(tf.reduce_sum(tf.math.imag(E_out)*tf.math.real(Eout)-tf.math.real(E_out)*tf.math.imag(Eout))) )) # or something simpler: # coupling_eff = tf.abs(tf.reduce_sum((tf.math.multiply(E_out,Eout)))) loss = - coupling_eff return loss variables = [phi1, phi2] # write variables in a list to optimize # define optimizer optimizer = tf.keras.optimizers.Adam(learning_rate= 1e-2) epoch_num = 20 for ii in tf.range(epoch_num): with tf.GradientTape() as tape: # this forward_propagate() function must be in the tape context! otherwise grads is None !! # the tape need to record the complete forward propagation E_out = forward_propagate(Ein,z,lambda_design_list,N_obj,dx) loss = loss_single(E_out, Eout) tf.print('ii =:',ii,'coupling_eff =:',-loss) # print('watched variables in tape:',[var.name for var in tape.watched_variables()]) # print("\n ===== calculate gradients now ====ERROR in NEXT LINE!!======\n\n") grads = tape.gradient(loss, variables) ## auto-differentiation # print(grads) # TensorFlow will update parameters automatically optimizer.apply_gradients(grads_and_vars=zip(grads, variables))

2020-11-29 20:41:57.457271: E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure 2020-11-29 20:41:57.457480: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:220] Unexpected Event status: 1 [I 20:42:05.512 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports