C++ GCC-msse2不生成SIMD代码_C++_Gcc_X86_Sse_Simd

C++ GCC-msse2不生成SIMD代码

c++ gcc x86

C++ GCC-msse2不生成SIMD代码,c++,gcc,x86,sse,simd,C++,Gcc,X86,Sse,Simd,我试图弄明白为什么g++不生成SIMD代码信息GCC/OS/CPU： $ gcc -v gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) $ cat /proc/cpuinfo ... model name : Intel(R) Core(TM)2 Duo CPU P8600 @ 2.40GHz ... 这里是我的C++代码： #include <iostream> #include <cstdlib> //fun

我试图弄明白为什么g++不生成SIMD代码

信息GCC/OS/CPU：

$ gcc -v
gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1)

$ cat /proc/cpuinfo
...
model name  : Intel(R) Core(TM)2 Duo CPU     P8600  @ 2.40GHz
...

这里是我的C++代码：

#include <iostream>
#include <cstdlib>

//function that fills an array with random numbers
template<class T>
void fillArray(T *array, int n){
    srand(1);
    for (int i = 0; i < n; i++) {
        array[i] = (float) (rand() % 10);
    }
}
// function that computes the dotprod of two vectors (loop unrolled)
float dotCPP(float *src1, float *src2, int n){
    float dest = 0;
    for (int i = 0; i < n; i+=2) {
        dest += (src1[i] * src2[i]) + (src1[i+1] * src2[i+1]);                
    }
    return dest;
}

int main(int argc, char *argv[])
{

    const int n = 1200000;           
    float *a = new float[n];   //allocate data on the heap
    float something_else;      //store result
    fillArray<float>(a,n);     //function that fills the array with random numbers
    something_else = dotCPP(a, a, n);  //call function and store return value

    return 0;
}

并使用gdb检查生成的代码：

$gdb dot
... 
(gdb) b dotCPP
(gdb) r
...
(gdb) disass
Dump of assembler code for function dotCPP(float*, float*, int):
=> 0x08048950 <+0>:     push   %ebx
   0x08048951 <+1>:     mov    0x10(%esp),%ebx
   0x08048955 <+5>:     mov    0x8(%esp),%edx
   0x08048959 <+9>:     mov    0xc(%esp),%ecx
   0x0804895d <+13>:    test   %ebx,%ebx
   0x0804895f <+15>:    jle    0x8048983 <dotCPP(float*, float*, int)+51>
   0x08048961 <+17>:    xor    %eax,%eax
   0x08048963 <+19>:    fldz   
   0x08048965 <+21>:    lea    0x0(%esi),%esi
   0x08048968 <+24>:    flds   (%edx,%eax,4)
   0x0804896b <+27>:    fmuls  (%ecx,%eax,4)
   0x0804896e <+30>:    flds   0x4(%edx,%eax,4)
   0x08048972 <+34>:    fmuls  0x4(%ecx,%eax,4)
   0x08048976 <+38>:    add    $0x2,%eax
   0x08048979 <+41>:    cmp    %eax,%ebx
   0x0804897b <+43>:    faddp  %st,%st(1)
   0x0804897d <+45>:    faddp  %st,%st(1)
   0x0804897f <+47>:    jg     0x8048968 <dotCPP(float*, float*, int)+24>
   0x08048981 <+49>:    pop    %ebx
   0x08048982 <+50>:    ret    
   0x08048983 <+51>:    fldz   
   0x08048985 <+53>:    pop    %ebx
   0x08048986 <+54>:    ret    
End of assembler dump.

$gdb点
... 
（gdb）b dotCPP
（gdb）r
...
（gdb）disass
函数dotCPP（float*，float*，int）的汇编代码转储：
=>0x08048950:推送%ebx
0x08048951:mov 0x10（%esp），%ebx
0x08048955:mov 0x8（%esp），%edx
0x08048959:mov 0xc（%esp），%ecx
0x0804895d:测试%ebx，%ebx
0x0804895f:jle 0x8048983
0x08048961:xor%eax，%eax
0x08048963:fldz
0x08048965:lea 0x0（%esi），%esi
0x08048968:FLD（%edx，%eax，4）
0x0804896b:fmuls（%ecx，%eax，4）
0x0804896e:flds 0x4（%edx，%eax，4）
0x08048972:fmuls 0x4（%ecx，%eax，4）
0x08048976:添加$0x2，%eax
0x08048979:cmp%eax，%ebx
0x0804897b:faddp%st，%st（1）
0x0804897d:faddp%st，%st（1）
0x0804897f:jg 0x8048968
0x08048981:弹出%ebx
0x08048982:ret
0x08048983:fldz
0x08048985:弹出%ebx
0x08048986:ret
汇编程序转储结束。

现在我是否遗漏了什么，或者gcc是否应该使用xmm寄存器

如果有任何建议能帮助我理解为什么gcc不生成使用xmm寄存器的代码，我将不胜感激

如果您需要任何进一步的信息，请告诉我。

-march=core2

意味着gcc可以假设（连同64位ISA）最高SSSE3（例如MMX、SSE、SSE2、SSE3）可用

-mfpmath=sse

然后可以强制使用sse进行浮点运算（默认为64位模式），而不是387位（默认为32位

-m32

模式）

请参阅手册页中的“英特尔386和AMD x86-64选项”部分

不幸的是，您仍然受到32位模式和32位ABI的限制。e、例如，仅寄存器

XMM0。。XMM7

可用<代码>XMM8。。XMM15仅在64位模式下可用。

-march=core2

意味着gcc可以假设（连同64位ISA）最多可使用SSSE3（例如MMX、SSE、SSE2、SSE3）

-mfpmath=sse

然后可以强制使用sse进行浮点运算（默认为64位模式），而不是387位（默认为32位

-m32

模式）

请参阅手册页中的“英特尔386和AMD x86-64选项”部分

不幸的是，您仍然受到32位模式和32位ABI的限制。e、例如，仅寄存器

XMM0。。XMM7

可用<代码>XMM8。。XMM15仅在64位模式下可用。

尝试摆脱dotCPP中的手动循环展开-保持标量代码尽可能简单可能有助于编译器发现SIMD优化的潜力。尽管编译器在SIMD优化方面仍然不是很好，所以如果这真的是性能关键的话，您可能不得不求助于使用SSE内部函数。我也有同样的问题。如果我没记错的话，我必须使用-march=native（或类似的-march选项）和-msse2来解决这个问题。FWIW我刚刚用clang尝试了上面的代码，它确实为

dotCPP

生成了SSE指令，即使在使用

-m32-msse2

编译时也是如此，所以，你可能想考虑使用CLAN而不是GCC。@保罗：我确实尝试过没有展开的循环。同样的结果。嗯，我有一段时间没和叮当声在一起了。但谢谢，我会尝试一下。尝试：

-march=core2

并添加：

-mfpmath=sse

dotCPP

生成了SSE指令，即使在使用

-m32-msse2

编译时也是如此，所以，你可能想考虑使用CLAN而不是GCC。@保罗：我确实尝试过没有展开的循环。同样的结果。嗯，我有一段时间没和叮当声在一起了。谢谢，我会试试的。试试：

-march=core2

并添加：

-mfpmath=sse

$gdb dot
... 
(gdb) b dotCPP
(gdb) r
...
(gdb) disass
Dump of assembler code for function dotCPP(float*, float*, int):
=> 0x08048950 <+0>:     push   %ebx
   0x08048951 <+1>:     mov    0x10(%esp),%ebx
   0x08048955 <+5>:     mov    0x8(%esp),%edx
   0x08048959 <+9>:     mov    0xc(%esp),%ecx
   0x0804895d <+13>:    test   %ebx,%ebx
   0x0804895f <+15>:    jle    0x8048983 <dotCPP(float*, float*, int)+51>
   0x08048961 <+17>:    xor    %eax,%eax
   0x08048963 <+19>:    fldz   
   0x08048965 <+21>:    lea    0x0(%esi),%esi
   0x08048968 <+24>:    flds   (%edx,%eax,4)
   0x0804896b <+27>:    fmuls  (%ecx,%eax,4)
   0x0804896e <+30>:    flds   0x4(%edx,%eax,4)
   0x08048972 <+34>:    fmuls  0x4(%ecx,%eax,4)
   0x08048976 <+38>:    add    $0x2,%eax
   0x08048979 <+41>:    cmp    %eax,%ebx
   0x0804897b <+43>:    faddp  %st,%st(1)
   0x0804897d <+45>:    faddp  %st,%st(1)
   0x0804897f <+47>:    jg     0x8048968 <dotCPP(float*, float*, int)+24>
   0x08048981 <+49>:    pop    %ebx
   0x08048982 <+50>:    ret    
   0x08048983 <+51>:    fldz   
   0x08048985 <+53>:    pop    %ebx
   0x08048986 <+54>:    ret    
End of assembler dump.