C++ 在OpenMP并行for循环中调用Armadillo函数会导致数据损坏

C++ 在OpenMP并行for循环中调用Armadillo函数会导致数据损坏,c++,openmp,armadillo,C++,Openmp,Armadillo,我正在尝试使用Armadillo和OpenMP并行处理一个大型线性系统,使用arma::solve。我不想直接调用解算器,而是想将问题分成更小的RHS向量块,并在OpenMP循环中并行调用它们,如下面的清单所示 这应该会得到相同的答案,因为多个RH是独立的问题,但是当我以这种方式运行代码时,我经常会弄乱一些列。我甚至尝试将写回包含在omp critical部分中,但仍然失败 犰狳以这种方式奔跑安全吗?还是我遗漏了什么 // to compile the code run as // g++ pa

我正在尝试使用Armadillo和OpenMP并行处理一个大型线性系统,使用
arma::solve
。我不想直接调用解算器,而是想将问题分成更小的RHS向量块,并在OpenMP循环中并行调用它们,如下面的清单所示

这应该会得到相同的答案,因为多个RH是独立的问题,但是当我以这种方式运行代码时,我经常会弄乱一些列。我甚至尝试将写回包含在
omp critical
部分中,但仍然失败

犰狳以这种方式奔跑安全吗?还是我遗漏了什么

// to compile the code run as
// g++ parals.cpp -I$ARMADILLO_INCLUDE_DIR  -L$ARMADILLO_INCLUDE_DIR/../lib64
//  -larmadillo -fopenmp -lopenblas -o parals.out
#define ARMA_DONT_USE_WRAPPER
#define ARMA_USE_BLAS
#define ARMA_USE_LAPACK

#include <armadillo>
#include <omp.h>
#include <iostream>

using namespace arma;

/*
 * Solves LS for a single problem,
 *
 * ||AX - B ||_F^2
 */

int main(int argc, char *argv[]) {
  int m = atoi(argv[1]);     // A is of size m \times n
  int n = atoi(argv[2]);     // A is of size m \times n
  int k = atoi(argv[3]);     // B is of size m \times k
  int seed = atoi(argv[4]);  // seed for random inits
  int chunk = atoi(argv[5]); // chunk size to group RHS

  arma::arma_rng::set_seed(seed);

  std::cout << "m::" << m << "::n::" << n << "::k::" << k
            << "::seed::" << seed << "::chunk::" << chunk
            << std::endl;

  mat A(m,n,arma::fill::randu);
  //std::cout << "A::" << std::endl << A << std::endl;

  mat B(m,k,arma::fill::randu);
  //std::cout << "B::" << std::endl << B << std::endl;

  mat AtA = A.t() * A;
  mat AtB = A.t() * B;
 
  // solve sequentially 
  mat Xseq = arma::solve(AtA, AtB, arma::solve_opts::likely_sympd);

 
  int num_chunks = AtB.n_cols / chunk;
  if (num_chunks * chunk < AtB.n_cols) num_chunks++; 

  mat Xchunk(n,k,arma::fill::zeros);

  for (int nt = 0; nt < num_chunks; nt++) {
    int spanStart = nt * chunk;
    int spanEnd = (nt + 1) * chunk - 1;
    if (spanEnd > AtB.n_cols - 1) {
      spanEnd = AtB.n_cols - 1;
    }

    mat rhs = AtB.cols(spanStart, spanEnd);
    mat Y = arma::solve(AtA, rhs, arma::solve_opts::likely_sympd);

    Xchunk.cols(spanStart, spanEnd) = Y;
  }
  
  bool chkchunk = arma::approx_equal(Xseq, Xchunk, "absdiff", 0.0001);
  std::cout << "(Xseq == Xchunk) ? = " << chkchunk << std::endl;
  
  if (!chkchunk) {
    std::cout << "Xseq::" << std::endl << Xseq << std::endl;
    std::cout << "Xchunk::" << std::endl << Xchunk << std::endl;
  }

  mat Xpar(n,k,arma::fill::zeros);

  #pragma omp parallel for schedule(static,1)
  for (int nt = 0; nt < num_chunks; nt++) {
    int spanStart = nt * chunk;
    int spanEnd = (nt + 1) * chunk - 1;
    if (spanEnd > AtB.n_cols - 1) {
      spanEnd = AtB.n_cols - 1;
    }

    mat rhs = AtB.cols(spanStart, spanEnd);
    mat Y = arma::solve(AtA, rhs, arma::solve_opts::likely_sympd);

    Xpar.cols(spanStart, spanEnd) = Y;
  }
  

  bool chkpar = arma::approx_equal(Xseq, Xpar, "absdiff", 0.0001);
  std::cout << "(Xseq == Xpar) ? = " << chkpar << std::endl;
  
  if (!chkpar) {
    std::cout << "Xseq::" << std::endl << Xseq << std::endl;
    std::cout << "Xpar::" << std::endl << Xpar << std::endl;
  }
  
  return 0;
}

您必须小心,因为为armadillo提供BLAS的库可能是线程安全的,也可能不是线程安全的。在stackoverflow中搜索“犰狳线”。你说得对。OpenBLAS不是线程安全的,除非您使用编译。如果您感兴趣,请检查代码,以检查armadillo中链接BLAS库是否存在多线程问题。基本上,它定义了一个函数,该函数执行一些矩阵乘法和SVD,并检查并行运行此函数与串行运行此函数时是否有任何差异。您可能需要更改
CMakeLists.txt
文件。
#!/bin/bash

for i in {1..20}
do
  echo $i
  unset OMP_NUM_THREADS;
  ./parals.out 10 5 20 17 3
done