C++ 单精度矩阵运算的特征性能AVX与SSE无差异？_C++_Eigen_Sse_Avx

C++ 单精度矩阵运算的特征性能AVX与SSE无差异？

c++

C++ 单精度矩阵运算的特征性能AVX与SSE无差异？,c++,eigen,sse,avx,C++,Eigen,Sse,Avx,在我的项目中，我使用Egeng3.3库对6x6矩阵进行计算。我决定调查AVX指令是否真的比SSE更快。我的CPU支持两种设置： model name : Intel(R) Xeon(R) CPU E5-1607 v2 @ 3.00GHz flags : ... sse sse2 ... ssse3 ... sse4_1 sse4_2 ... avx ... 因此，我使用gcc4.8使用两组不同的标志编译了一个小测试，如下所示： $ g++ test-eigen.

在我的项目中，我使用Egeng3.3库对6x6矩阵进行计算。我决定调查AVX指令是否真的比SSE更快。我的CPU支持两种设置：

model name      : Intel(R) Xeon(R) CPU E5-1607 v2 @ 3.00GHz
flags           : ...  sse sse2 ... ssse3 ... sse4_1 sse4_2 ... avx ...

因此，我使用gcc4.8使用两组不同的标志编译了一个小测试，如下所示：

$ g++ test-eigen.cxx -o test-eigen -march=native -O2 -mavx
$ g++ test-eigen.cxx -o test-eigen -march=native -O2 -mno-avx

我确认第二个案例中的

-mno avx

没有产生任何带有

ymm

寄存器的指令。然而，这两种情况给我的结果非常相似，约为520ms，使用

perf

进行测量

以下是程序test-eigen.cxx（它对两个矩阵的和进行了求逆，以接近我正在处理的实际任务）：

#定义NDEBUG
#包括
#包括“本征/密集”
使用名称空间特征；
int main（）
{
类型定义矩阵MyMatrix\t；
MyMatrix_t A=MyMatrix_t:：Random（）；
MyMatrix_t B=MyMatrix_t:：Random（）；
MyMatrix_t C=MyMatrix_t:：Zero（）；
MyMatrix_t D=MyMatrix_t:：Zero（）；
MyMatrix_t E=MyMatrix_t:：常数（0.001）；
//构造A和B对称正定矩阵
A.对角线（）=A.对角线（）.cwiseAbs（）；
A.noalias（）=MyMatrix_t（A.triangularView（））*MyMatrix_t（A.triangularView（））.transpose（）；
B.对角线（）=B.对角线（）.cwiseAbs（）；
B.noalias（）=MyMatrix_t（B.triangularView（））*MyMatrix_t（B.triangularView（））.transpose（）；
对于（int i=0；i<1000000；i++）
{
//计算C=（A+B）^-1
C=（A+B）.llt（）.solve（MyMatrix_t:：Identity（））；
D+=C；
//以某种方式修改A和B，使它们保持对称
A+=B；
B+=E；
}
std:：cout您使用的矩阵太小，无法使用AVX：单精度下，AVX可以同时处理8个标量的数据包。当使用6x6矩阵时，AVX只能用于纯组件操作，如A=B+C
，因为它们可以被视为对大于8的大小为36的1D向量的操作。在您的例子中，t与Cholesky分解和求解的成本相比，这类操作可以忽略不计
要查看差异，请转到大小为100x100或更大的矩阵。我还不知道Eigen支持AVX。事实上，它现在甚至支持AVX512。很高兴知道这一点。
#define NDEBUG

#include <iostream>
#include "Eigen/Dense"

using namespace Eigen;

int main()
{
   typedef Matrix<float, 6, 6> MyMatrix_t;

   MyMatrix_t A = MyMatrix_t::Random();
   MyMatrix_t B = MyMatrix_t::Random();
   MyMatrix_t C = MyMatrix_t::Zero();
   MyMatrix_t D = MyMatrix_t::Zero();
   MyMatrix_t E = MyMatrix_t::Constant(0.001);

   // Make A and B symmetric positive definite matrices
   A.diagonal() = A.diagonal().cwiseAbs();
   A.noalias() = MyMatrix_t(A.triangularView<Lower>()) * MyMatrix_t(A.triangularView<Lower>()).transpose();

   B.diagonal() = B.diagonal().cwiseAbs();
   B.noalias() = MyMatrix_t(B.triangularView<Lower>()) * MyMatrix_t(B.triangularView<Lower>()).transpose();

   for (int i = 0; i < 1000000; i++)
   {
      // Calculate C = (A + B)^-1
      C = (A + B).llt().solve(MyMatrix_t::Identity());

      D += C;

      // Somehow modify A and B so they remain symmetric
      A += B;
      B += E;
   }

   std::cout << D << "\n";

   return 0;
}