Floating point 在GPU上检测FPU舍入模式_Floating Point_Floating Point Precision_Fpu

Floating point 在GPU上检测FPU舍入模式

floating-point

Floating point 在GPU上检测FPU舍入模式,floating-point,floating-point-precision,fpu,Floating Point,Floating Point Precision,Fpu,我在钻研多精度算术，有一类很好的快速算法，如Jonathan Richard Shewchuk所述，“自适应精度浮点算法和快速鲁棒几何谓词”，1997，离散与计算几何，第305–363页。然而，这些算法依赖于FPU使用从圆到偶数的分段在CPU上，这将很容易，只需检查或设置FPU状态字即可确定。然而，对于GPU编程，目前还没有这样的指令这就是为什么我想知道是否有一种可靠的方法可以检测（而不是设置）未知FPU上的舍入模式，也许可以通过计算几个测试并查看结果浮点数的位模式编辑：总而言之，公认的

我在钻研多精度算术，有一类很好的快速算法，如Jonathan Richard Shewchuk所述，“自适应精度浮点算法和快速鲁棒几何谓词”，1997，离散与计算几何，第305–363页。然而，这些算法依赖于FPU使用从圆到偶数的分段

在CPU上，这将很容易，只需检查或设置FPU状态字即可确定。然而，对于GPU编程，目前还没有这样的指令

这就是为什么我想知道是否有一种可靠的方法可以检测（而不是设置）未知FPU上的舍入模式，也许可以通过计算几个测试并查看结果浮点数的位模式

编辑：

总而言之，公认的代码似乎确实有效，您可以尝试：

#include <stdio.h>
#include <stdlib.h>
#include <float.h> // _controlfp()
#include <stdint.h>

int is_round_to_nearest()
{
    union {
        double f;
        uint64_t n;
    } special;

    special.n = 0 | (((uint64_t)(-0x100 + 1023) & 0x7ff) << 52) | 0; // no sign, 1.0 mantissa is expressed as zeroes, the 1 is implicit
    //const double special.f = atof("0x1.0p-100");

    if( 1.0 + special.f !=  1.0)
        return 0;
    if( 1.0 - special.f !=  1.0)
        return 0;
    if(-1.0 + special.f != -1.0)
        return 0;
    if(-1.0 - special.f != -1.0)
        return 0;
    return 1;
}

void main()
{
    printf("default : %d\n", is_round_to_nearest());
    _controlfp(_RC_CHOP, _MCW_RC);
    printf("_RC_CHOP : %d\n", is_round_to_nearest());
    _controlfp(_RC_UP, _MCW_RC);
    printf("_RC_UP : %d\n", is_round_to_nearest());
    _controlfp(_RC_DOWN, _MCW_RC);
    printf("_RC_DOWN : %d\n", is_round_to_nearest());
    _controlfp(_RC_NEAR, _MCW_RC);
    printf("_RC_NEAR : %d\n", is_round_to_nearest());
}

请注意，在我的机器上，我无法将“四舍五入”设置为“离零最近”模式。在VisualStudio中，浮点模式需要设置为strict（

/fp:strict

），否则在发布模式下（所有标识为最近的模式）将无法工作

即使在发行版中，即使使用默认或fast（

/fp:precise

，

/fp:fast

）舍入模式，以下代码似乎也能工作，但仍然无法保证编译器将如何优化代码：

int is_round_to_nearest()
{
    union {
        double f;
        uint64_t n;
    } special;

    special.n = 0 | (((uint64_t)(-0x100 + 1023) & 0x7ff) << 52) | 0; // no sign, 1.0 mantissa is expressed as zeroes, the 1 is implicit
    //const double special.f = atof("0x1.0p-100");

    volatile double v;
    v = 1.0; v += special.f;
    if(v !=  1.0)
        return 0;
    v = 1.0; v -= special.f;
    if(v !=  1.0)
        return 0;
    v = -1.0; v += special.f;
    if(v != -1.0)
        return 0;
    v = -1.0; v -= special.f;
    if(v != -1.0)
        return 0;
    return 1;
}

int是四舍五入到最近的（）
{
联合{
双f；
uint64\u t n；
}特别的；
special.n=0 |（（（uint64_t）（-0x100+1023）和0x7ff）此C代码告诉您，您要么是整数到最接近的偶数，要么确实使用了一种奇怪的浮点结构：
int is_round_to_nearest(void)
{
  if ( 1.0 + 0x1.0p-100 !=  1.0) return 0;
  if ( 1.0 - 0x1.0p-100 !=  1.0) return 0;
  if (-1.0 + 0x1.0p-100 != -1.0) return 0;
  if (-1.0 - 0x1.0p-100 != -1.0) return 0;
  return 1;
}

您可以在上面的所有12个浮点常量中添加一个f
后缀，以获得一个单精度函数。
我最终开发了一个稍加修改的例程，该例程测试领带的处理方式，而不是使用哪种舍入模式，因为它是舍入到最近的（由Pascal Cuoq的代码正确检测），断开连接仍然可以是远离零的连接，但通常不会，至少在x86机器上不会
检测最接近偶数的连接的代码为：
int b_TieBreak_ToEven()
{
    //                                      <- 16B double ->
    //                                         <- fraction->
    const double special = f_ParseXDouble("0x0.00000000000008p+0"); // one, at the position one past the LSB
    const double oddone =  f_ParseXDouble("0x1.0000000000001p+0"); // one, ending with a single one at LSB
    const double evenone = f_ParseXDouble("0x1.0000000000002p+0"); // one, ending with a single one to the left of LSB

    volatile double v;
    v = 1.0; v += special;
    if(v != 1.0)
        return 0;
    v = oddone; v += special;
    if(v != evenone) // odd + half rounds to even
        return 0;
    v = evenone; v += special;
    if(v != evenone) // even + half rounds to the same even
        return 0;
    v = -1.0; v -= special;
    if(v != -1.0)
        return 0;
    v = -oddone; v -= special;
    if(v != -evenone) // -odd - half rounds to -even
        return 0;
    v = -evenone; v -= special;
    if(v != -evenone) // -even - half rounds to the same -even
        return 0;

    return 1;
}

关于覆盆子皮：
all unit tests passed
default : mode : to nearest, ties to even (ties to even: 1)

在NVIDIA GPU（480、680和780英寸）上：
如果FPU设置为roundities toway
，那么它不会仍然通过这个测试吗？@MarkDickinson这样的舍入模式在这个答案的第一句话中会被称为“奇怪”。我期待四种IEEE 754舍入模式（最接近偶数、向上、向下、向零），并添加了第三个和第四个良好度量测试。我不知道应该期望什么样的外来舍入模式，应该测试什么样的舍入模式。您有它们的列表吗？嗯，rounditestoway
是IEEE 754舍入模式之一（IEEE 754-2008中规定了五种）.但据我所知，它不受任何流行体系结构的支持……更仔细地阅读标准，rounditestoway似乎主要针对十进制格式。对不起，噪音太大了。@MarkDickinson哦，是的，现在有道理了。这种舍入可以称为“小学舍入”。
all unit tests passed
default : mode : to nearest, ties to even (ties to even: 1)
_RC_CHOP : mode : towards zero (ties to even: 0)
_RC_UP : mode : towards positive infinity (ties to even: 0)
_RC_DOWN : mode : towards negative infinity (ties to even: 0)
_RC_NEAR : mode : to nearest, ties to even (ties to even: 1)

all unit tests passed
default : mode : to nearest, ties to even (ties to even: 1)

OpenCL platform 'NVIDIA CUDA' by NVIDIA Corporation, version OpenCL 1.1 CUDA 6.0.1, FULL_PROFILE
device: NVIDIA Corporation 'GeForce GTX 680' (driver version: 331.65)
        OpenCL version: OpenCL 1.1 CUDA
        OpenCL "C" version: OpenCL C 1.1

GPU mode: round to nearest, ties to even