C++ 递归的意外性能结果_C++_Recursion

C++ 递归的意外性能结果

c++ recursion

C++ 递归的意外性能结果,c++,recursion,C++,Recursion,我想知道为什么这两对明显的递归示例会带来意想不到的性能相同的递归函数在结构内部更快（rec2 VS rec1），相同的递归模板函数在使用伪参数时更快（rec4 VS rec3） < C++函数是否更快？更多的参数？！以下是已尝试的代码： #include <QDebug> #include <QElapsedTimer> constexpr std::size_t N = 28; std::size_t counter = 0; // non template

我想知道为什么这两对明显的递归示例会带来意想不到的性能

相同的递归函数在结构内部更快（rec2 VS rec1），相同的递归模板函数在使用伪参数时更快（rec4 VS rec3）

< C++函数是否更快？更多的参数？！以下是已尝试的代码：

#include <QDebug>
#include <QElapsedTimer>


constexpr std::size_t N = 28;
std::size_t counter = 0;


// non template function which take 1 argument
void rec1(std::size_t depth)
{
    ++counter;
    if ( depth < N )
    {
        rec1(depth + 1);
        rec1(depth + 1);
    }
}

// non template member which take 2 arguments (implicit this)
struct A
{
    void rec2(std::size_t depth)
    {
        ++counter;
        if ( depth < N )
        {
            rec2(depth + 1);
            rec2(depth + 1);
        }
    }
};

// template function which take 0 argument
template <std::size_t D>
void rec3()
{
    ++counter;
    rec3<D - 1>();
    rec3<D - 1>();
}

template <>
void rec3<0>()
{
    ++counter;
}

// template function which take 1 (dummy) argument
struct Foo
{
    int x;
};

template <std::size_t D>
void rec4(Foo x)
{
    ++counter;
    rec4<D - 1>(x);
    rec4<D - 1>(x);
}

template <>
void rec4<0>(Foo x)
{
    ++counter;
}


int main()
{
    QElapsedTimer t;
    t.start();
    rec1(0);
    qDebug() << "time1" << t.elapsed();
    qDebug() << "counter" << counter;
    counter = 0;
    A a;
    t.start();
    a.rec2(0);
    qDebug()<< "time2"  << t.elapsed();
    qDebug()<< "counter"  << counter;
    counter = 0;
    t.start();
    rec3<N>();
    qDebug()<< "time3"  << t.elapsed();
    qDebug()<< "counter"  << counter;
    counter = 0;
    t.start();
    rec4<N>(Foo());
    qDebug()<< "time4"  << t.elapsed();
    qDebug()<< "counter"  << counter;

    qDebug() << "fin";

    return 0;
}

我启用了：Windows 8.1/i7 3630QM/最新的Qt chaintool/c++14

我终于能够在Visual Studio 2015社区上看到这一点。检查编译代码的反汇编，rec1和rec2是递归的。它们在生成的代码中非常相似，尽管rec2有更多的指令，但运行速度稍快。rec3和rec4都为模板参数中D的所有不同值生成一系列函数，在这种情况下，编译器消除了许多函数调用，消除了其他函数调用，并添加了一个更大的值进行计数。（例如，rec4只是将2047添加到计数并返回。）

因此，您看到的性能差异主要是由于编译器如何优化每个版本，而代码在CPU中的流动方式略有差异也是一个因素

我的结果（以秒计的时间），用/Ox/O2编译：

time1 1.03411
counter 536870911
time2 0.970455
counter 536870911
time3 0.000866
counter 536870911
time4 0.000804
counter 536870911

您使用的优化级别是什么？成员函数（thiscall）和独立（cdecl）函数的调用约定不同，这可能是它们之间时间差异的原因。@templatetypedef:我不知道“优化级别”是什么，但我在Qt creator中使用的是释放模式，带有所有默认设置。@engf-010:可能。。。我还尝试将第一个函数放在名称空间中，但它是相同的。我们也可以在名称空间中使用cdecl函数吗？@cevik:是的，调用约定只指定参数的传递方式，名称空间只是一种奇特的命名方法，所以这些东西是不相关的。好的，谢谢。事实上，我需要为每个深度调用一个特定的函数。因此，如果我通过函数指针数组添加一个调用，那么我希望优化的意义不大。

time1 1.03411
counter 536870911
time2 0.970455
counter 536870911
time3 0.000866
counter 536870911
time4 0.000804
counter 536870911