Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/cplusplus/141.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
C++ tbb::parallel_reduce和std::accumulate的结果不同_C++_C++11_Reduce_Tbb - Fatal编程技术网

C++ tbb::parallel_reduce和std::accumulate的结果不同

C++ tbb::parallel_reduce和std::accumulate的结果不同,c++,c++11,reduce,tbb,C++,C++11,Reduce,Tbb,我在学习。当对std::vector中的所有值求和时,tbb::parallel_reduce的结果与std::acculate在向量中超过16.777.220个元素的情况下不同(在16.777.320个元素处出现错误)。以下是我的最低工作示例: #include <iostream> #include <vector> #include <numeric> #include <limits> #include "tbb/tbb.h" int m

我在学习。当对
std::vector
中的所有值求和时,
tbb::parallel_reduce
的结果与
std::acculate
在向量中超过16.777.220个元素的情况下不同(在16.777.320个元素处出现错误)。以下是我的最低工作示例:

#include <iostream>
#include <vector>
#include <numeric>
#include <limits>
#include "tbb/tbb.h"

int main(int argc, const char * argv[]) {

    int count = std::numeric_limits<int>::max() * 0.0079 - 187800; // - 187900 works

    std::vector<float> heights(size);
    std::fill(heights.begin(), heights.end(), 1.0f);

    float ssum = std::accumulate(heights.begin(), heights.end(), 0);
    float psum = tbb::parallel_reduce(tbb::blocked_range<std::vector<float>::iterator>(heights.begin(), heights.end()), 0,
                                      [](tbb::blocked_range<std::vector<float>::iterator> const& range, float init) {
                                          return std::accumulate(range.begin(), range.end(), init);
                                      }, std::plus<float>()
                                      );

    std::cout << std::endl << " Heights serial sum: " << ssum << "   parallel sum: " << psum;
    return 0;
}
为什么?我应该向TBB开发人员报告错误吗?


附加测试,应用您的答案:

 correct value is: 1949700403
 cause we add 1.0f to zero 1949700403 times

 using (int) init values:
 Runtime: 17.407 sec. Heights serial   sum: 16777216.000, wrong
 Runtime:  8.482 sec. Heights parallel sum: 131127368.000, wrong

 using (float) init values:
 Runtime: 12.594 sec. Heights serial   sum: 16777216.000, wrong
 Runtime:  5.044 sec. Heights parallel sum: 303073632.000, wrong

 using (double) initial values:
 Runtime: 13.671 sec. Heights serial   sum: 1949700352.000, wrong
 Runtime:  5.343 sec. Heights parallel sum: 263690016.000, wrong

 using (double) initial values and tbb::parallel_deterministic_reduce:
 Runtime: 13.463 sec. Heights serial   sum: 1949700352.000, wrong
 Runtime: 99.031 sec. Heights parallel sum: 1949700352.000, wrong >>> almost 10x slower !
为什么所有reduce调用都会产生错误的总和?
(双倍)
不够吗?
以下是我的测试代码:

    #include <iostream>
    #include <vector>
    #include <numeric>
    #include <limits>
    #include <sys/time.h>
    #include <iomanip>
    #include "tbb/tbb.h"
    #include <cmath>

    class StopWatch {
    private:
        double elapsedTime;
        timeval startTime, endTime;
    public:
        StopWatch () : elapsedTime(0) {}
        void startTimer() {
            elapsedTime = 0;
            gettimeofday(&startTime, 0);
        }
        void stopNprintTimer() {
            gettimeofday(&endTime, 0);
            elapsedTime = (endTime.tv_sec - startTime.tv_sec) * 1000.0;             // compute sec to ms
            elapsedTime += (endTime.tv_usec - startTime.tv_usec) / 1000.0;          // compute us to ms and add
            std::cout << " Runtime: " << std::right << std::setw(6) << elapsedTime / 1000 << " sec.";             // show in sec
        }
    };

    int main(int argc, const char * argv[]) {

        StopWatch watch;
        std::cout << std::fixed << std::setprecision(3) << "" << std::endl;
        size_t count = std::numeric_limits<int>::max() * 0.9079;

        std::vector<float> heights(count);
        std::cout << " Vector size: " << count << std::endl;
        std::fill(heights.begin(), heights.end(), 1.0f);

        watch.startTimer();
        float ssum = std::accumulate(heights.begin(), heights.end(), 0.0); // change type of initial value here
        watch.stopNprintTimer();
        std::cout << " Heights serial   sum: " << std::right << std::setw(8) << ssum << std::endl;

        watch.startTimer();
        float psum = tbb::parallel_reduce(tbb::blocked_range<std::vector<float>::iterator>(heights.begin(), heights.end()), 0.0, // change type of initial value here
                                          [](tbb::blocked_range<std::vector<float>::iterator> const& range, float init) {
                                              return std::accumulate(range.begin(), range.end(), init);
                                          }, std::plus<float>()
                                          );
        watch.stopNprintTimer();
        std::cout << " Heights parallel sum: " << std::right << std::setw(8) << psum << std::endl;

        return 0;
    }

std::acculate
的调用是进行整数加法,然后在计算结束时将结果转换为
float
。为了对浮点数进行累加,累加器应该是一个
float
*



*或者任何其他可以正确累积
浮动的类型。

这可能会为您解决此特定问题:

对std::accumulate的调用是进行整数加法,然后在计算结束时将结果转换为float

但浮点加法不是关联运算:

  • 带累加:(…((s+a1)+a2)+…)+an
  • 使用parralel_reduce:任何括号排列都是可能的

对于“为什么”部分的其他正确答案,我还想补充一点,TBB提供了保证在相同数据上两次或多次运行之间可重复结果的功能(但它仍然可能与std::accumulate不同)。请参见描述问题和确定性算法


因此,关于“我应该向TBB开发人员报告错误吗?”部分,答案显然是否定的(除非您发现TBB方面存在不足)。

浮点运算不是实数运算。如果更改操作顺序,可能会出现不同的舍入错误。谢谢。正如std::acculate模板语法所示,我在这里使用了int,幸运的是,我用1.0f填充了向量,当转换为int时,它是1。当使用浮点值时,结果仍然不正确。但这一次是由于浮点数据类型在较高数字区域中的不准确。感谢您在此处指出浮点精度问题以及指向伟大文档的链接(向上投票)。感谢您的提示。不幸的是,在我的4线程intel i7上,它比初始值为双类型的串行std::accumulate()需要更多的时间。仔细阅读链接,我现在了解到,
tbb::parallel_deterministic_reduce
不会产生正确的结果,但至少会重复错误的结果,这意味着每次运行都会产生相同的错误。请允许我举个例子:需要注意的是,使用并行_deterministic _reduce获得的可重复结果可能仍然不同于通过串行执行获得的结果。[…]此外,该算法的目的不是为了提高计算精度。它不是一个错误。或者对于std::acgregate也可以这样说,对于较小的数字,并行性可能不值得。但也很难对TBB进行正确的基准测试,因为线程的创建通常正在进行中,而在实际应用程序中却不是这样。
    #include <iostream>
    #include <vector>
    #include <numeric>
    #include <limits>
    #include <sys/time.h>
    #include <iomanip>
    #include "tbb/tbb.h"
    #include <cmath>

    class StopWatch {
    private:
        double elapsedTime;
        timeval startTime, endTime;
    public:
        StopWatch () : elapsedTime(0) {}
        void startTimer() {
            elapsedTime = 0;
            gettimeofday(&startTime, 0);
        }
        void stopNprintTimer() {
            gettimeofday(&endTime, 0);
            elapsedTime = (endTime.tv_sec - startTime.tv_sec) * 1000.0;             // compute sec to ms
            elapsedTime += (endTime.tv_usec - startTime.tv_usec) / 1000.0;          // compute us to ms and add
            std::cout << " Runtime: " << std::right << std::setw(6) << elapsedTime / 1000 << " sec.";             // show in sec
        }
    };

    int main(int argc, const char * argv[]) {

        StopWatch watch;
        std::cout << std::fixed << std::setprecision(3) << "" << std::endl;
        size_t count = std::numeric_limits<int>::max() * 0.9079;

        std::vector<float> heights(count);
        std::cout << " Vector size: " << count << std::endl;
        std::fill(heights.begin(), heights.end(), 1.0f);

        watch.startTimer();
        float ssum = std::accumulate(heights.begin(), heights.end(), 0.0); // change type of initial value here
        watch.stopNprintTimer();
        std::cout << " Heights serial   sum: " << std::right << std::setw(8) << ssum << std::endl;

        watch.startTimer();
        float psum = tbb::parallel_reduce(tbb::blocked_range<std::vector<float>::iterator>(heights.begin(), heights.end()), 0.0, // change type of initial value here
                                          [](tbb::blocked_range<std::vector<float>::iterator> const& range, float init) {
                                              return std::accumulate(range.begin(), range.end(), init);
                                          }, std::plus<float>()
                                          );
        watch.stopNprintTimer();
        std::cout << " Heights parallel sum: " << std::right << std::setw(8) << psum << std::endl;

        return 0;
    }
[...]
std::vector<int> heights(count);
std::cout << " Vector size: " << count << std::endl;
std::fill(heights.begin(), heights.end(), 1);

watch.startTimer();
int ssum = std::accumulate(heights.begin(), heights.end(), (int)0);
watch.stopNprintTimer();
std::cout << " Heights serial   sum: " << std::right << std::setw(8) << ssum << std::endl;

watch.startTimer();
int psum = tbb::parallel_reduce(tbb::blocked_range<std::vector<int>::iterator>(heights.begin(), heights.end()), (int)0,
                                  [](tbb::blocked_range<std::vector<int>::iterator> const& range, int init) {
                                      return std::accumulate(range.begin(), range.end(), init);
                                  }, std::plus<int>()
                                  );
watch.stopNprintTimer();
std::cout << " Heights parallel sum: " << std::right << std::setw(8) << psum << std::endl;
[...]
Vector size: 1949700403
Runtime: 13.041 sec. Heights serial   sum: 1949700403, correct
Runtime:  4.728 sec. Heights parallel sum: 1949700403, correct and almost 4x faster
float ssum = std::accumulate(heights.begin(), heights.end(), 0.0f);
                                                             ^^^^