C++ 在OpenMP并行for循环中调用Armadillo函数会导致数据损坏
我正在尝试使用Armadillo和OpenMP并行处理一个大型线性系统,使用C++ 在OpenMP并行for循环中调用Armadillo函数会导致数据损坏,c++,openmp,armadillo,C++,Openmp,Armadillo,我正在尝试使用Armadillo和OpenMP并行处理一个大型线性系统,使用arma::solve。我不想直接调用解算器,而是想将问题分成更小的RHS向量块,并在OpenMP循环中并行调用它们,如下面的清单所示 这应该会得到相同的答案,因为多个RH是独立的问题,但是当我以这种方式运行代码时,我经常会弄乱一些列。我甚至尝试将写回包含在omp critical部分中,但仍然失败 犰狳以这种方式奔跑安全吗?还是我遗漏了什么 // to compile the code run as // g++ pa
arma::solve
。我不想直接调用解算器,而是想将问题分成更小的RHS向量块,并在OpenMP循环中并行调用它们,如下面的清单所示
这应该会得到相同的答案,因为多个RH是独立的问题,但是当我以这种方式运行代码时,我经常会弄乱一些列。我甚至尝试将写回包含在omp critical
部分中,但仍然失败
犰狳以这种方式奔跑安全吗?还是我遗漏了什么
// to compile the code run as
// g++ parals.cpp -I$ARMADILLO_INCLUDE_DIR -L$ARMADILLO_INCLUDE_DIR/../lib64
// -larmadillo -fopenmp -lopenblas -o parals.out
#define ARMA_DONT_USE_WRAPPER
#define ARMA_USE_BLAS
#define ARMA_USE_LAPACK
#include <armadillo>
#include <omp.h>
#include <iostream>
using namespace arma;
/*
* Solves LS for a single problem,
*
* ||AX - B ||_F^2
*/
int main(int argc, char *argv[]) {
int m = atoi(argv[1]); // A is of size m \times n
int n = atoi(argv[2]); // A is of size m \times n
int k = atoi(argv[3]); // B is of size m \times k
int seed = atoi(argv[4]); // seed for random inits
int chunk = atoi(argv[5]); // chunk size to group RHS
arma::arma_rng::set_seed(seed);
std::cout << "m::" << m << "::n::" << n << "::k::" << k
<< "::seed::" << seed << "::chunk::" << chunk
<< std::endl;
mat A(m,n,arma::fill::randu);
//std::cout << "A::" << std::endl << A << std::endl;
mat B(m,k,arma::fill::randu);
//std::cout << "B::" << std::endl << B << std::endl;
mat AtA = A.t() * A;
mat AtB = A.t() * B;
// solve sequentially
mat Xseq = arma::solve(AtA, AtB, arma::solve_opts::likely_sympd);
int num_chunks = AtB.n_cols / chunk;
if (num_chunks * chunk < AtB.n_cols) num_chunks++;
mat Xchunk(n,k,arma::fill::zeros);
for (int nt = 0; nt < num_chunks; nt++) {
int spanStart = nt * chunk;
int spanEnd = (nt + 1) * chunk - 1;
if (spanEnd > AtB.n_cols - 1) {
spanEnd = AtB.n_cols - 1;
}
mat rhs = AtB.cols(spanStart, spanEnd);
mat Y = arma::solve(AtA, rhs, arma::solve_opts::likely_sympd);
Xchunk.cols(spanStart, spanEnd) = Y;
}
bool chkchunk = arma::approx_equal(Xseq, Xchunk, "absdiff", 0.0001);
std::cout << "(Xseq == Xchunk) ? = " << chkchunk << std::endl;
if (!chkchunk) {
std::cout << "Xseq::" << std::endl << Xseq << std::endl;
std::cout << "Xchunk::" << std::endl << Xchunk << std::endl;
}
mat Xpar(n,k,arma::fill::zeros);
#pragma omp parallel for schedule(static,1)
for (int nt = 0; nt < num_chunks; nt++) {
int spanStart = nt * chunk;
int spanEnd = (nt + 1) * chunk - 1;
if (spanEnd > AtB.n_cols - 1) {
spanEnd = AtB.n_cols - 1;
}
mat rhs = AtB.cols(spanStart, spanEnd);
mat Y = arma::solve(AtA, rhs, arma::solve_opts::likely_sympd);
Xpar.cols(spanStart, spanEnd) = Y;
}
bool chkpar = arma::approx_equal(Xseq, Xpar, "absdiff", 0.0001);
std::cout << "(Xseq == Xpar) ? = " << chkpar << std::endl;
if (!chkpar) {
std::cout << "Xseq::" << std::endl << Xseq << std::endl;
std::cout << "Xpar::" << std::endl << Xpar << std::endl;
}
return 0;
}
您必须小心,因为为armadillo提供BLAS的库可能是线程安全的,也可能不是线程安全的。在stackoverflow中搜索“犰狳线”。你说得对。OpenBLAS不是线程安全的,除非您使用编译。如果您感兴趣,请检查代码,以检查armadillo中链接BLAS库是否存在多线程问题。基本上,它定义了一个函数,该函数执行一些矩阵乘法和SVD,并检查并行运行此函数与串行运行此函数时是否有任何差异。您可能需要更改
CMakeLists.txt
文件。
#!/bin/bash
for i in {1..20}
do
echo $i
unset OMP_NUM_THREADS;
./parals.out 10 5 20 17 3
done