half2在CUDA中的使用_Cuda_Vectorization_Precision

half2在CUDA中的使用

cuda

half2在CUDA中的使用,cuda,vectorization,precision,Cuda,Vectorization,Precision,我试图使用half2，但我遇到了一个错误，即 error: class "__half2" has no member "y" 发生错误的代码部分如下所示： uint8_t V_ [128]; // some elements (uint8), to save space float V_C[128]; // storing the diff to use later half2 *C_ = C.elements; // D halfs stored as half

我试图使用half2，但我遇到了一个错误，即

error: class "__half2" has no member "y"

发生错误的代码部分如下所示：

uint8_t V_ [128];       // some elements (uint8), to save space
float   V_C[128];       // storing the diff to use later
half2 *C_ = C.elements; // D halfs stored as half2, to be read
Cvalue = 0.0;
for (d = 0; d < D; d+=2)
{
  V_C [d  ] = V_[d]   - __half2float(C_[d/2].x)    ;
  V_C [d+1] = V_[d+1] - __half2float(C_[d/2].y)    ;
  Cvalue   += V_C [d]   * V_C [d]  ;
  Cvalue   += V_C [d+1] * V_C [d+1];
}

uint8_t V_[128]；//一些元素（uint8），以节省空间
浮动V_C[128]；//存储差异以供以后使用
half2*C_u2;=C.元素；//D存储为half2的HALF，待读取
C值=0.0；
对于（d=0；d


需要帮忙吗
更新：
谢谢你的帮助！我终于用了下面的
uint8_t V_ [128] ;
float   V_C[128] ;
const half2 *C_ = C.elements;
Cvalue = 0.0;
float2 temp_;
for (d = 0; d < D; d+=2)
  {
    temp_     = __half22float2(C_[d/2]);
    V_C [d  ] = V_[d]   - temp_.x      ;
    V_C [d+1] = V_[d+1] - temp_.y      ;
    Cvalue   += V_C [d]   * V_C [d]  ;
    Cvalue   += V_C [d+1] * V_C [d+1];
  }

uint8_t V_[128]；
浮点数V_C[128]；
常数half2*C=C元素；
C值=0.0；
浮动2温度；
对于（d=0；d

在我的特定应用程序中，我得到了一个轻微的加速，因为从全局内存加载是瓶颈…
您不能使用点运算符访问half2
的部分，您应该使用内在函数
从：
更重要的是，根据C.elements
的类型，这一行
half2 *C_ = C.elements; // D halfs stored as half2, to be read

可能是错误的（如果C.elements
是half*。此处注释不清楚）。
half2
不是一对halfs。
实际上，在当前的实现中，half2
只是一个包装在结构中的无符号int
：
// cuda_fp16.h

typedef struct __align__(2) {
   unsigned short x;
} __half;

typedef struct __align__(4) {
   unsigned int x;
} __half2;

#ifndef CUDA_NO_HALF
typedef __half half;
typedef __half2 half2;
#endif /*CUDA_NO_HALF*/

没有人说half
s的数组可以作为half2
s的数组访问。
对齐注意事项。在文档中，您可以提取“低16位”或“高16位”以获得half2的相应一半。因此，即使没有明确说明，文档也相当清楚half2是一对32位对齐的half。我甚至会冒着与_m128d是一对FP64的平行性的风险（有时可以通过别名选择对齐）。@FlorentDUGUET我们可能会猜测、实验，甚至玩比特游戏，是的。我和其他人一样喜欢它。但是在任何情况下，您都不应该依赖于生产代码中的这些假设。请随意发布位破解结果，我将非常兴奋地看到您的假设是否得到证实。在我看来，lowhalf文档在half2的位布局上相当明确。确实没有明确的合同，但文档对我来说已经足够明确了。你的观点仍然存在，因为这更像是一种观点。他们实际上是从half*
到half2*自己
// cuda_fp16.h

typedef struct __align__(2) {
   unsigned short x;
} __half;

typedef struct __align__(4) {
   unsigned int x;
} __half2;

#ifndef CUDA_NO_HALF
typedef __half half;
typedef __half2 half2;
#endif /*CUDA_NO_HALF*/