C++ 用GCC实现自动矢量化

C++ 用GCC实现自动矢量化,c++,gcc,auto-vectorization,C++,Gcc,Auto Vectorization,我想在我的代码中矢量化矩阵向量积。我尝试在GCC中使用自动矢量化,但它根本不起作用,我也不知道如何使它起作用。现在我尝试一个非常简单的示例代码: #define N 200000 double a[N] __attribute__(( aligned(16) )); double b[N] __attribute__(( aligned(16) )) ; double c[N] __attribute__(( aligned(16) )) ; void f() { int i;

我想在我的代码中矢量化矩阵向量积。我尝试在GCC中使用自动矢量化,但它根本不起作用,我也不知道如何使它起作用。现在我尝试一个非常简单的示例代码:

#define N 200000
double a[N] __attribute__(( aligned(16) ));
double b[N] __attribute__(( aligned(16) )) ;
double c[N] __attribute__(( aligned(16) )) ;

void f()
{ 
    int i;

    for( i = 0; i < N; i++ )
    {
        a[i] = b[i] + c[i];
    }
}

int main( int argc,char** argv )
{
    f();

    return( 0 );
}
我得到了以下结果我真的不明白这些东西是什么意思,但底线显然是矢量化不起作用:

optimization2.cpp:10:2: note: misalign = 0 bytes of ref b[i_11]
optimization2.cpp:10:2: note: misalign = 0 bytes of ref c[i_11]
optimization2.cpp:10:2: note: misalign = 0 bytes of ref a[i_11]
optimization2.cpp:10:2: note: virtual phi. skip.
optimization2.cpp:10:2: note: num. args = 4 (not unary/binary/ternary op).
optimization2.cpp:10:2: note: not ssa-name.
optimization2.cpp:10:2: note: use not simple.
optimization2.cpp:10:2: note: num. args = 4 (not unary/binary/ternary op).
optimization2.cpp:10:2: note: not ssa-name.
optimization2.cpp:10:2: note: use not simple.
optimization2.cpp:6:6: note: not vectorized: not enough data-refs in basic block.
optimization2.cpp:12:13: note: not vectorized: no vectype for stmt: vect__4.6_1 = MEM[(int *)vectp_b.4_9];
 scalar_type: vector(4) int
optimization2.cpp:12:13: note: not vectorized: not enough data-refs in basic block.
optimization2.cpp:6:6: note: not vectorized: not enough data-refs in basic block.
optimization2.cpp:14:1: note: not vectorized: not enough data-refs in basic block.
optimization2.cpp:10:2: note: misalign = 0 bytes of ref b[i_12]
optimization2.cpp:10:2: note: misalign = 0 bytes of ref c[i_12]
optimization2.cpp:10:2: note: misalign = 0 bytes of ref a[i_12]
optimization2.cpp:10:2: note: virtual phi. skip.
optimization2.cpp:10:2: note: num. args = 4 (not unary/binary/ternary op).
optimization2.cpp:10:2: note: not ssa-name.
optimization2.cpp:10:2: note: use not simple.
optimization2.cpp:10:2: note: num. args = 4 (not unary/binary/ternary op).
optimization2.cpp:10:2: note: not ssa-name.
optimization2.cpp:10:2: note: use not simple.
optimization2.cpp:16:5: note: not vectorized: not enough data-refs in basic block.
optimization2.cpp:12:13: note: not vectorized: no vectype for stmt: vect__4.28_2 = MEM[(int *)vectp_b.26_9];
 scalar_type: vector(4) int
optimization2.cpp:12:13: note: not vectorized: not enough data-refs in basic block.
optimization2.cpp:16:5: note: not vectorized: not enough data-refs in basic block.
optimization2.cpp:16:5: note: not vectorized: not enough data-refs in basic block.

我尝试过很多其他的东西和代码,但从来没有得到任何矢量化。诀窍是什么?

当我使用gcc 4.9.2和相同的命令行编译它时,虽然-march=native可能意味着不同的东西,添加-save temp,但我从优化器和包含vmodapd和vaddpd的程序集获得相同的输出,即向量化。我认为优化器注释是关于代码中无法矢量化的其他部分。您确定生成的asm中添加的内容没有矢量化吗?@Wintermute:从其他示例中,我的印象是GCC显式声明循环是否已矢量化详细级别不会改变这一点。代码由我发布的内容组成,没有其他循环会导致矢量化失败。我还没有看过汇编代码。但是,在假设-O2的情况下,无论是否激活矢量化,似乎都不会有速度增益。没有/几乎没有性能增益是因为gcc 4.9.2生成SSE指令,尽管即使使用-O0也不会生成SSE2。SSE2指令仅使用-O3生成,但在这种情况下它们比SSE1的优势是有限的。为什么优化器输出没有提到任何这一点是任何人的猜测,但我正在查看生成的asm,在那里它是清晰可见的。
optimization2.cpp:10:2: note: misalign = 0 bytes of ref b[i_11]
optimization2.cpp:10:2: note: misalign = 0 bytes of ref c[i_11]
optimization2.cpp:10:2: note: misalign = 0 bytes of ref a[i_11]
optimization2.cpp:10:2: note: virtual phi. skip.
optimization2.cpp:10:2: note: num. args = 4 (not unary/binary/ternary op).
optimization2.cpp:10:2: note: not ssa-name.
optimization2.cpp:10:2: note: use not simple.
optimization2.cpp:10:2: note: num. args = 4 (not unary/binary/ternary op).
optimization2.cpp:10:2: note: not ssa-name.
optimization2.cpp:10:2: note: use not simple.
optimization2.cpp:6:6: note: not vectorized: not enough data-refs in basic block.
optimization2.cpp:12:13: note: not vectorized: no vectype for stmt: vect__4.6_1 = MEM[(int *)vectp_b.4_9];
 scalar_type: vector(4) int
optimization2.cpp:12:13: note: not vectorized: not enough data-refs in basic block.
optimization2.cpp:6:6: note: not vectorized: not enough data-refs in basic block.
optimization2.cpp:14:1: note: not vectorized: not enough data-refs in basic block.
optimization2.cpp:10:2: note: misalign = 0 bytes of ref b[i_12]
optimization2.cpp:10:2: note: misalign = 0 bytes of ref c[i_12]
optimization2.cpp:10:2: note: misalign = 0 bytes of ref a[i_12]
optimization2.cpp:10:2: note: virtual phi. skip.
optimization2.cpp:10:2: note: num. args = 4 (not unary/binary/ternary op).
optimization2.cpp:10:2: note: not ssa-name.
optimization2.cpp:10:2: note: use not simple.
optimization2.cpp:10:2: note: num. args = 4 (not unary/binary/ternary op).
optimization2.cpp:10:2: note: not ssa-name.
optimization2.cpp:10:2: note: use not simple.
optimization2.cpp:16:5: note: not vectorized: not enough data-refs in basic block.
optimization2.cpp:12:13: note: not vectorized: no vectype for stmt: vect__4.28_2 = MEM[(int *)vectp_b.26_9];
 scalar_type: vector(4) int
optimization2.cpp:12:13: note: not vectorized: not enough data-refs in basic block.
optimization2.cpp:16:5: note: not vectorized: not enough data-refs in basic block.
optimization2.cpp:16:5: note: not vectorized: not enough data-refs in basic block.