concurrency:：fast_math:：tanh（）以并行方式为每个返回NaN（C+；+；AMP）计算C++的值。环境：VS2015，Win8。为每个函数并行运行时，值为NaN。原因是concurrency:：fast\u math:：tanh函数_C++_Visual Studio 2015_C++ Amp

concurrency:：fast_math:：tanh（）以并行方式为每个返回NaN（C+；+；AMP）计算C++的值。环境：VS2015，Win8。为每个函数并行运行时，值为NaN。原因是concurrency:：fast\u math:：tanh函数

c++ visual-studio-2015

concurrency:：fast_math:：tanh（）以并行方式为每个返回NaN（C+；+；AMP）计算C++的值。环境：VS2015，Win8。为每个函数并行运行时，值为NaN。原因是concurrency:：fast\u math:：tanh函数,c++,visual-studio-2015,c++-amp,C++,Visual Studio 2015,C++ Amp,concurrency:：fast\u math:：tanh当参数大于1000时，通过parallel\u为每个运行时，函数返回一个NaN： float arr[2]; concurrency::array_view<float> arr_view(2, arr); concurrency::extent<1> ex; ex[0] = 1; parallel_for_each(ex, [=](Concurrency::index<1> idx) restric

concurrency:：fast\u math:：tanh

当参数大于

时，通过

parallel\u为每个

运行时，函数返回一个NaN：

float arr[2];
concurrency::array_view<float> arr_view(2, arr);
concurrency::extent<1> ex;
ex[0] = 1;
parallel_for_each(ex, [=](Concurrency::index<1> idx) restrict(amp){
    float t = 10000000;
    arr_view[0] = concurrency::fast_math::fabs(t);
    arr_view[1] = concurrency::fast_math::tanh(t);
});

arr_view.synchronize();
std::cout << arr[0] << "," << arr[1] << std::endl;

情况2，如果每个_没有并行运行_：

float arr[2];
concurrency::array_view<float> arr_view(2, arr);
concurrency::extent<1> ex;
ex[0] = 1;
float t = 10000000;
arr_view[0] = concurrency::fast_math::fabs(t);
arr_view[1] = concurrency::fast_math::tanh(t);

arr_view.synchronize();
std::cout << arr[0] << "," << arr[1] << std::endl;

这是我期待的结果。如果将tanh更改为tanhf，则结果相同

为什么tanh函数返回NaN？为什么，只在为每个人运行Parralle_时返回NaN？

请告诉我问题的原因和解决方法。

在

fast\u math

中定义的函数优先考虑速度而不是精度。实现和精度取决于硬件。如果不对每个语法使用

parallel\u，代码将在CPU上运行，CPU只实现一个“精确的”tanh
函数，因此给出正确答案
要解决此问题，可以调用precise\u math
下的函数
concurrency::precise_math::tanh(t);

如果这太慢，并且fast\u math:：tanh的精度足够，您可以尝试以下方法
double myTanh(double t){
  return (concurrency::fast_math::fabs(t)>100) ? concurrency::precise_math::copysign(1,t) : concurrency::fast_math::tanh(t);
}

它的运行速度可能比精确版本快，也可能不快，这取决于硬件。因此，您需要运行一些测试。
并发：：快速数学中的大多数函数都不能保证返回正确的值。其中一些（如tanh）甚至可以返回NaN值。在我的HD 6870上，所有数字的快速tanh超过90返回NaN

下面是一些解决这个问题的技巧。



您可以将Tanh参数“绑定”到10
float Tanh(float val) restrict(amp)
{
    if (val > 10)
        return 1;
    else if (val < -10)
        return-1;
    return Concurrency::fast_math::tanh(val);
}

很久以前在某处发现了这个tanh近似。它相当快而且相当精确。


但是，如果您需要非常精确的tanh，您可以将concurrency:：fast\u math
替换为concurrency:：precise\u math
。但此选项有一个主要缺点：precise\u math无法在许多GPU（例如my 6870）上运行。
从
这些功能包括
单精度功能，需要扩展的双精度支持
在加速器上。你可以使用
accelerator:：支持双精度数据成员来确定
可以在特定加速器上运行这些功能
此外，precise\u math
的速度可能比fast\u math
慢10倍以上，尤其是在非专业视频卡上。



如果每个
块运行的并发代码不是在并行块中，那么看起来您实际上没有使用gpu。所以，tanh u evaluate在CPU上进行评估，没有GPU特定的错误。事实上，如果您运行此代码
float t = 0.65;
arr_view[1] = concurrency::fast_math::tanh(t);  
parallel_for_each(e, [=](index<1> idx)      restrict(amp)
{
    arr_view[0] = concurrency::fast_math::tanh(t);
}); 
std::cout << arr[0] << "," << arr[1] << std::endl;
arr_view.synchronize();
std::cout << arr[0] << "," << arr[1] << std::endl;
std::cout << arr[0] - arr[1] << std::endl;//may return non-zero value, depending on gpu

float t=0.65；
arr_-view[1]=并发：：快速数学：：tanh（t）；
每个（e，[=]（索引idx）限制（安培）的并行（安培）
{
arr\u view[0]=并发：：快速数学：：tanh（t）；
}); 
std：：很抱歉，如果t也超过100，tanh retuns NaN。（e^x+e^-x）/（e^x-e^-x）用大的ish x我可以看到得到不好的结果。
float Tanh(float val) restrict(amp)
{
    if (val > 10)
        return 1;
    else if (val < -10)
        return-1;
    return Concurrency::fast_math::tanh(val);
}

float Tanh(float val) restrict(amp)
{
    float ax = fabs(val);
    float x2 = val * val;
    float z = val * (1.0f + ax + (1.05622909486427f + 0.215166815390934f * x2 * ax) * x2);
    return (z / (1.02718982441289f + fabs(z)));
}

float t = 0.65;
arr_view[1] = concurrency::fast_math::tanh(t);  
parallel_for_each(e, [=](index<1> idx)      restrict(amp)
{
    arr_view[0] = concurrency::fast_math::tanh(t);
}); 
std::cout << arr[0] << "," << arr[1] << std::endl;
arr_view.synchronize();
std::cout << arr[0] << "," << arr[1] << std::endl;
std::cout << arr[0] - arr[1] << std::endl;//may return non-zero value, depending on gpu