Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/298.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python CUDA GPU处理:TypeError:compile#u kernel()获得意外的关键字参数';boundscheck';_Python_Cuda_Gpu_Numba - Fatal编程技术网

Python CUDA GPU处理:TypeError:compile#u kernel()获得意外的关键字参数';boundscheck';

Python CUDA GPU处理:TypeError:compile#u kernel()获得意外的关键字参数';boundscheck';,python,cuda,gpu,numba,Python,Cuda,Gpu,Numba,今天我开始使用CUDA和GPU处理。我发现这个教程: 不幸的是,我第一次尝试运行gpu代码失败: from numba import jit, cuda import numpy as np # to measure exec time from timeit import default_timer as timer # normal function to run on cpu def func(a):

今天我开始使用CUDA和GPU处理。我发现这个教程:

不幸的是,我第一次尝试运行gpu代码失败:

from numba import jit, cuda 
import numpy as np 
# to measure exec time 
from timeit import default_timer as timer 

# normal function to run on cpu 
def func(a):                                 
    for i in range(10000000): 
        a[i]+= 1    

# function optimized to run on gpu 
@jit(target ="cuda")                         
def func2(a): 
    for i in range(10000000): 
        a[i]+= 1
if __name__=="__main__": 
    n = 10000000                            
    a = np.ones(n, dtype = np.float64) 
    b = np.ones(n, dtype = np.float32) 

    start = timer() 
    func(a) 
    print("without GPU:", timer()-start)     

    start = timer() 
    func2(a) 
    print("with GPU:", timer()-start) 
输出:

/home/amu/anaconda3/bin/python /home/amu/PycharmProjects/gpu_processing_base/gpu_base_1.py
without GPU: 4.89985659904778
Traceback (most recent call last):
  File "/home/amu/PycharmProjects/gpu_processing_base/gpu_base_1.py", line 30, in <module>
    func2(a)
  File "/home/amu/anaconda3/lib/python3.7/site-packages/numba/cuda/dispatcher.py", line 40, in __call__
    return self.compiled(*args, **kws)
  File "/home/amu/anaconda3/lib/python3.7/site-packages/numba/cuda/compiler.py", line 758, in __call__
    kernel = self.specialize(*args)
  File "/home/amu/anaconda3/lib/python3.7/site-packages/numba/cuda/compiler.py", line 769, in specialize
    kernel = self.compile(argtypes)
  File "/home/amu/anaconda3/lib/python3.7/site-packages/numba/cuda/compiler.py", line 785, in compile
    **self.targetoptions)
  File "/home/amu/anaconda3/lib/python3.7/site-packages/numba/core/compiler_lock.py", line 32, in _acquire_compile_lock
    return func(*args, **kwargs)
TypeError: compile_kernel() got an unexpected keyword argument 'boundscheck'

Process finished with exit code 1
/home/amu/anaconda3/bin/python/home/amu/PycharmProjects/gpu\u-processing\u-base/gpu-base\u 1.py
不带GPU:4.89985659904778
回溯(最近一次呼叫最后一次):
文件“/home/amu/PycharmProjects/gpu_processing_base/gpu_base_1.py”,第30行,在
职能2(a)
文件“/home/amu/anaconda3/lib/python3.7/site packages/numba/cuda/dispatcher.py”,第40行,在调用中__
返回自编译(*args,**kws)
文件“/home/amu/anaconda3/lib/python3.7/site packages/numba/cuda/compiler.py”,第758行,在调用中__
kernel=self.specialize(*args)
文件“/home/amu/anaconda3/lib/python3.7/site packages/numba/cuda/compiler.py”,第769行,在specialize中
kernel=self.compile(argtypes)
文件“/home/amu/anaconda3/lib/python3.7/site packages/numba/cuda/compiler.py”,第785行,在compile中
**自我定位(目标选项)
文件“/home/amu/anaconda3/lib/python3.7/site packages/numba/core/compiler\u lock.py”,第32行,在“获取”编译锁中
返回函数(*args,**kwargs)
TypeError:compile_kernel()获得意外的关键字参数“boundscheck”
进程已完成,退出代码为1

我已经在pycharm的anaconda环境中安装了教程中提到的
numba
cudatoolkit

添加答案以将其从未应答队列中删除

该示例中的代码已损坏。您的numba或CUDA安装没有任何问题。你问题中的代码(或你从中复制它的博客)不可能发出博客文章声称的结果

有许多方法可以修改以使其工作。一个是这样的:

from numba import vectorize, jit, cuda 
import numpy as np 
# to measure exec time 
from timeit import default_timer as timer 

# normal function to run on cpu 
def func(a):                                 
    for i in range(10000000): 
        a[i]+= 1    

# function optimized to run on gpu 
@vectorize(['float64(float64)'], target ="cuda")                         
def func2(x): 
    return x+1

if __name__=="__main__": 
    n = 10000000                            
    a = np.ones(n, dtype = np.float64) 

    start = timer() 
    func(a) 
    print("without GPU:", timer()-start)     

    start = timer() 
    func2(a) 
    print("with GPU:", timer()-start) 
$ python bogoexample.py 
0  without GPU: 4.885275377891958
1  without GPU: 4.748716968111694
2  without GPU: 4.902181145735085
3  without GPU: 4.889955999329686
4  without GPU: 4.881594380363822
0  with GPU ufunc: 0.16726416163146496
1  with GPU ufunc: 0.03758022002875805
2  with GPU ufunc: 0.03580896370112896
3  with GPU ufunc: 0.03530424740165472
4  with GPU ufunc: 0.03579768259078264
0  with GPU kernel: 0.1421878095716238
1  with GPU kernel: 0.04386183246970177
2  with GPU kernel: 0.029975440353155136
3  with GPU kernel: 0.029602501541376114
4  with GPU kernel: 0.029780613258481026
此处
func2
成为为设备编译的ufunc。然后,它将在GPU上的整个输入阵列上运行。这样做可以:

$ python bogoexample.py 
without GPU: 4.314514834433794
with GPU: 0.21419800259172916
因此速度更快,但请记住,GPU时间包括编译GPU ufunc所需的时间

另一种选择是实际编写GPU内核。像这样:

from numba import vectorize, jit, cuda 
import numpy as np 
# to measure exec time 
from timeit import default_timer as timer 

# normal function to run on cpu 
def func(a):                                 
    for i in range(10000000): 
        a[i]+= 1    

# function optimized to run on gpu 
@vectorize(['float64(float64)'], target ="cuda")                         
def func2(x): 
    return x+1

# kernel to run on gpu
@cuda.jit
def func3(a, N):
    tid = cuda.grid(1)
    if tid < N:
        a[tid] += 1


if __name__=="__main__": 
    n = 10000000                            
    a = np.ones(n, dtype = np.float64) 

    for i in range(0,5):
         start = timer() 
         func(a) 
         print(i, " without GPU:", timer()-start)     

    for i in range(0,5):
         start = timer() 
         func2(a) 
         print(i, " with GPU ufunc:", timer()-start) 

    threadsperblock = 1024
    blockspergrid = (a.size + (threadsperblock - 1)) // threadsperblock
    for i in range(0,5):
         start = timer() 
         func3[blockspergrid, threadsperblock](a, n) 
         print(i, " with GPU kernel:", timer()-start) 

在这里,您可以看到内核的运行速度略快于ufunc,而缓存(这是JIT编译函数的缓存,而不是调用的记忆)显著加快了GPU上的调用。

从该教程复制的代码是错误的,不起作用。我的建议是找一个更好的教程,可以考虑使用C/C++来代替,方法如下:这里的官方教程:总结一下,“优化为在gpu上运行的函数”应该用
@vectorize
装饰器装饰,而不是
@jit
。后者意味着您正在编写一个CUDA内核,在这种情况下,函数中的代码和函数调用本身都需要进行大量修改changed@Hack06:鉴于这基本上是一个Python加速练习,这似乎不是特别有用或有建设性的建议。问题是用python标记的,代码是python,还有一个链接指向关于使用Nuba加速python的教程。它需要有多明显?工作正常,但只要python执行行执行
@cuda
@vectorize
下定义的任何函数,就会有60秒的编译时间延迟,显然cuda和GPU正在编译。60秒后,如您所示,一切顺利完成。您能消除这个60秒的编译时步骤吗?或者这是一个必要的缺点吗?