C++ 在C+中维护x*x+；_C++_C++11_Math_Optimization

C++ 在C+中维护x*x+；

c++ c++11 math optimization

C++ 在C+中维护x*x+；,c++,c++11,math,optimization,C++,C++11,Math,Optimization,我有以下while-循环 uint32_t x = 0; while(x*x < STOP_CONDITION) { if(CHECK_CONDITION) x++ // Do other stuff that modifies CHECK_CONDITION } uint32\u t x=0；同时（x*x

我有以下

while

-循环

uint32_t x = 0;
while(x*x < STOP_CONDITION) {
    if(CHECK_CONDITION) x++
    // Do other stuff that modifies CHECK_CONDITION
}

uint32\u t x=0；
同时（x*x<停止条件）{
如果（检查条件）x++
//做其他修改检查条件的事情
}

STOP\u条件在运行时是常量，但在编译时不是。是否有更有效的方法来维护x*x
，还是每次都需要重新计算
 那么：
uint32_t x = 0;
double bound= sqrt(STOP_CONDITION);
while(x < bound) {
    if(CHECK_CONDITION) x++
    // Do other stuff that modifies CHECK_CONDITION
}

uint32\u t x=0；
双边界=sqrt（停止条件）；
while（x

这样，您就可以省去额外的计算。
注意：根据下面的说明，此代码的运行速度比其他代码慢1-2%。请阅读底部的免责声明

除了Tamas Ionut的答案外，如果您想将停止条件
保持为实际停止条件并避免平方根计算，您可以使用数学恒等式更新平方
(x + 1)² = x² + 2x + 1

无论何时更改x
：
uint32_t x = 0;
unit32_t xSquare = 0;
while(xSquare < STOP_CONDITION) {
    if(CHECK_CONDITION) {
      xSquare += 2 * x + 1;
      x++;
    }
    // Do other stuff that modifies CHECK_CONDITION
}

uint32\u t x=0；
unit32_t xSquare=0；
while（xSquare

由于2*x+1
只是一个位移位和增量，编译器应该能够很好地对此进行优化
免责声明：既然你问我“如何优化这段代码”，我就用一种可能更快的方法来回答。实际上，双+增量是否比单整数乘法快，应该在实践中进行测试。是否应该优化代码是另一个问题。我假设您已经对循环进行了基准测试，发现它是一个瓶颈，或者您对这个问题有理论兴趣。如果您正在编写要优化的生产代码，请首先测量性能，然后根据需要进行优化（这可能不是此循环中的x*x
），在您的案例中，可读性优化比性能优化更好，因为我们讨论的是一个非常小的性能优化
compl可以为您优化很多性能，但可读性取决于程序员的责任
我为Tamas Ionut和CompuChip答案做了一个小的基准测试，结果如下：
塔马斯爱奥尼特：19.7068
此方法的代码：
uint32_t x = 0;
double bound= sqrt(STOP_CONDITION);
while(x < bound) {
    if(CHECK_CONDITION) x++
    // Do other stuff that modifies CHECK_CONDITION
}

uint32_t x = 0;
unit32_t xSquare = 0;
while(xSquare < STOP_CONDITION) {
    if(CHECK_CONDITION) {
      xSquare += 2 * x + 1;
      x++;
    }
    // Do other stuff that modifies CHECK_CONDITION
}

uint32\u t x=0；
双边界=sqrt（停止条件）；
while（x


计算机芯片：20.2056
此方法的代码：
uint32_t x = 0;
double bound= sqrt(STOP_CONDITION);
while(x < bound) {
    if(CHECK_CONDITION) x++
    // Do other stuff that modifies CHECK_CONDITION
}

uint32_t x = 0;
unit32_t xSquare = 0;
while(xSquare < STOP_CONDITION) {
    if(CHECK_CONDITION) {
      xSquare += 2 * x + 1;
      x++;
    }
    // Do other stuff that modifies CHECK_CONDITION
}

uint32\u t x=0；
unit32_t xSquare=0；
while（xSquare

使用STOP_条件=1000000
并重复该过程1000000
次

环境：

编译器：MSVC 2013
操作系统：Windows 8.1-X64
处理器：核心i7-4510U
@2.00GHz
释放模式-最大速度（/O2）
我相信Tamas Ionut解决方案比CompuChip好，因为for循环中只有x++。然而，将uint32_t与double进行比较将扼杀这笔交易。如果我们使用uint32_t作为边界，而不是使用double，那么效率会更高。这种方法对于数值溢出的问题较少，因为如果我们想要得到正确的x^2值，x不能大于2^16=65536
如果我们在循环中也做了大量的工作，那么从这两种方法得到的结果应该非常相似，然而，Tamas-Ionut方法更简单，更容易阅读
下面是我的代码和使用带有-O3标志的clang版本3.8.0获得的相应程序集代码。从汇编代码中可以非常清楚地看出，第一种方法更有效
using T = size_t;

void test1(const T stopCondition, bool checkCondition) {
    T x = 0;
    while (x < stopCondition) {
        if (checkCondition) {
            x++;
        }
        // Do something heavy here
    }
}

void test2(const T stopCondition, bool checkCondition) {
    T x = 0;
    T xSquare = 0;
    const T threshold = stopCondition * stopCondition;
    while (xSquare < threshold) {
        if (checkCondition) {
            xSquare += 2 * x + 1;
            x++;
        }
        // Do something heavy here
    }
}

(gdb) disassemble test1
Dump of assembler code for function _Z5test1mb:
   0x0000000000400be0 <+0>: movzbl %sil,%eax
   0x0000000000400be4 <+4>: mov    %rax,%rcx
   0x0000000000400be7 <+7>: neg    %rcx
   0x0000000000400bea <+10>:    nopw   0x0(%rax,%rax,1)
   0x0000000000400bf0 <+16>:    add    %rax,%rcx
   0x0000000000400bf3 <+19>:    cmp    %rdi,%rcx
   0x0000000000400bf6 <+22>:    jb     0x400bf0 <_Z5test1mb+16>
   0x0000000000400bf8 <+24>:    retq   
End of assembler dump.
(gdb) disassemble test2
Dump of assembler code for function _Z5test2mb:
   0x0000000000400c00 <+0>: imul   %rdi,%rdi
   0x0000000000400c04 <+4>: test   %sil,%sil
   0x0000000000400c07 <+7>: je     0x400c2e <_Z5test2mb+46>
   0x0000000000400c09 <+9>: xor    %eax,%eax
   0x0000000000400c0b <+11>:    mov    $0x1,%ecx
   0x0000000000400c10 <+16>:    test   %rdi,%rdi
   0x0000000000400c13 <+19>:    je     0x400c42 <_Z5test2mb+66>
   0x0000000000400c15 <+21>:    data32 nopw %cs:0x0(%rax,%rax,1)
   0x0000000000400c20 <+32>:    add    %rcx,%rax
   0x0000000000400c23 <+35>:    add    $0x2,%rcx
   0x0000000000400c27 <+39>:    cmp    %rdi,%rax
   0x0000000000400c2a <+42>:    jb     0x400c20 <_Z5test2mb+32>
   0x0000000000400c2c <+44>:    jmp    0x400c42 <_Z5test2mb+66>
   0x0000000000400c2e <+46>:    test   %rdi,%rdi
   0x0000000000400c31 <+49>:    je     0x400c42 <_Z5test2mb+66>
   0x0000000000400c33 <+51>:    data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
   0x0000000000400c40 <+64>:    jmp    0x400c40 <_Z5test2mb+64>
   0x0000000000400c42 <+66>:    retq   
End of assembler dump.

使用T=size\u T；
无效测试1（常量停止条件、布尔检查条件）{
tx=0；
while（x