C++ 将双精度向量并行打印到文件（C+；+；，OpenMP）_C++_File_Openmp

C++ 将双精度向量并行打印到文件（C+；+；，OpenMP）

c++ file

C++ 将双精度向量并行打印到文件（C+；+；，OpenMP）,c++,file,openmp,C++,File,Openmp,我正在尝试使用OpenMP并行输出一个大矢量的double。目前我的结论是： #include <iomanip> #include <iostream> #include <sstream> #include <string> #include <omp.h> #include <fstream> int main() { int n = 10000000; double* v = new

我正在尝试使用OpenMP并行输出一个大矢量的double。目前我的结论是：

 #include <iomanip>
 #include <iostream>
 #include <sstream>
 #include <string>
 #include <omp.h>
 #include <fstream>

 int main()
 {
    int n = 10000000;
    double* v = new double[n];
    for (int i = 0; i < n; i++)
        v[i] = (double)i;

    int chunk;

    std::ofstream filestream;
    filestream.open("file", std::ios::binary | std::ios::out);

    omp_set_num_threads(4);

    double tic = omp_get_wtime(), toc;
    #pragma omp parallel
    {
        int id = omp_get_thread_num();

        if (id == 0) chunk = n / omp_get_num_threads(); //todo: assert that 4 threads have been set

        double* vsub;
        int start = id * chunk;
        int end = (id + 1) * chunk - 1;

        #pragma omp critical
        {
            std::cout << "Id = " << id << ", Chunk = " << chunk << ", start = " << start << ", end = " << end << '\n';
            vsub = v + start;
        }

        std::stringstream ss;
        ss << std::setprecision(6) << std::fixed;
        for (int i = 0; i < end - start + 1; i++)
        {
            ss << vsub[i] << '\n';
        }
        std::string s = ss.str();

        #pragma omp for ordered schedule(static, 1)
        for (int t = 0; t < omp_get_num_threads(); ++t)
        {
        #pragma omp ordered
            {
                filestream.write(s.c_str(), s.size());
            }
        }
    }

    toc = omp_get_wtime();
    printf("It took %f seconds to run\n", toc - tic);
    getchar();

    delete[] v;
    filestream.close();
 }

#包括
#包括
#包括
#包括
#包括
#包括
int main（）
{
int n=10000000；
双精度*v=新双精度[n]；
对于（int i=0；istd:：cout输出本身，无论是到std:：cout
还是到文件，都需要在线程之间进行同步，以确保一次只有一个线程进行同步。随着线程的增加，这一瓶颈会变得更严重。但是95%的时间用于构造字符串，而不需要在线程之间进行同步ads。创建字符串时的内存分配也需要同步。我尝试使用字符缓冲区（每个线程一个），但行为没有改变。因此字符串的内存分配可能不是问题。当（n%4）！=0时，您有一个错误。。。