Parallel processing restrict(amp)比CUDA内核代码更严格吗? 在C++的AMP中,内核函数或lambdas用限制(AMP)标记,这对C++的允许子集()有严格的限制。CUDA允许在内核函数中C或C++子集上有更多的自由吗?< P> VisualStudio 11和CUDA 4.1,限制(AMP)< /C>函数比CUDA的类似最值得注意的是,AMP对指针的使用方式限制更大。这是AMP的DirectX11计算基底的自然结果,它禁止(图形着色器)代码中的指针。相比之下,CUDA的低级别IR是,这比HLSL更通用
下面是逐行比较:Parallel processing restrict(amp)比CUDA内核代码更严格吗? 在C++的AMP中,内核函数或lambdas用限制(AMP)标记,这对C++的允许子集()有严格的限制。CUDA允许在内核函数中C或C++子集上有更多的自由吗?< P> VisualStudio 11和CUDA 4.1,限制(AMP)< /C>函数比CUDA的类似最值得注意的是,AMP对指针的使用方式限制更大。这是AMP的DirectX11计算基底的自然结果,它禁止(图形着色器)代码中的指针。相比之下,CUDA的低级别IR是,这比HLSL更通用,parallel-processing,cuda,gpu,c++-amp,Parallel Processing,Cuda,Gpu,C++ Amp,下面是逐行比较: | VS 11 AMP restrict(amp) functions | CUDA 4.1 sm_2x __device__ functions | |------------------------------------------------------------------------------| |* can only call functions that have |* can only call functions that have
| VS 11 AMP restrict(amp) functions | CUDA 4.1 sm_2x __device__ functions |
|------------------------------------------------------------------------------|
|* can only call functions that have |* can only call functions that have |
| the restrict(amp) clause | the __device__ decoration |
|* The function must be inlinable |* need not be inlined |
|* The function can declare only |* Class types are allowed |
| POD variables | |
|* Lambda functions cannot |* Lambdas are not supported, but |
| capture by reference and | user functors can hold pointers |
| cannot capture pointers | |
|* References and single-indirection |* References and multiple-indirection |
| pointers are supported only as | pointers are supported |
| local variables and function | |
|* No recursion |* Recursion OK |
|* No volatile variables |* Volatile variables OK |
|* No virtual functions |* Virtual functions OK |
|* No pointers to functions |* Pointers to functions OK |
|* No pointers to member functions |* Pointers to member functions OK |
|* No pointers in structures |* Pointers in structures OK |
|* No pointers to pointers |* Pointers to pointers OK |
|* No goto statements |* goto statements OK |
|* No labeled statements |* Labeled statements OK |
|* No try, catch, or throw statements |* No try, catch, or throw statements |
|* No global variables |* Global __device__ variables OK |
|* Static variables through tile_static |* Static variables through __shared__ |
|* No dynamic_cast |* No dynamic_cast |
|* No typeid operator |* No typeid operator |
|* No asm declarations |* asm declarations (inline PTX) OK |
|* No varargs |* No varargs |
您可以阅读有关
限制(amp)
限制的更多信息。您可以在CUDA <代码>中引用C++支持,在附录D中的函数可能与以下问题有关。好问题,尽管我担心它不是真正的可比性(也许迁移到程序员.SE?):nvcc根本不支持C++11,所以当谈到lambdas时,你显然没有走得很远!另一方面,AMP有完全不同的限制,首先是微软;这(或者更准确地说,目前缺乏非DirectX实现)使得它完全无法用于许多应用,例如科学应用。“但我想你是说只有语言限制吗?”@leftaroundabout:是的,我只是说语言限制,我可以留在C++03中。我提到LAMBDAS,只是因为它是用C++ AMP.IIRC启动内核代码的规定机制。这里讨论的是C++ AMP可能启用但没有的功能,有时基于明确的选择来鼓励并行计算中的良好实践: