C++ 标准::绑定与lambda性能
我想给一些函数的执行计时,我为自己编写了一个助手:C++ 标准::绑定与lambda性能,c++,caching,c++11,lambda,bind,C++,Caching,C++11,Lambda,Bind,我想给一些函数的执行计时,我为自己编写了一个助手: using namespace std; template<int N = 1, class Fun, class... Args> void timeExec(string name, Fun fun, Args... args) { auto start = chrono::steady_clock::now(); for(int i = 0; i < N; ++i) { fun(arg
using namespace std;
template<int N = 1, class Fun, class... Args>
void timeExec(string name, Fun fun, Args... args) {
auto start = chrono::steady_clock::now();
for(int i = 0; i < N; ++i) {
fun(args...);
}
auto end = chrono::steady_clock::now();
auto diff = end - start;
cout << name << ": "<< chrono::duration<double, milli>(diff).count() << " ms. << endl;
}
我不知道它的内部结构,但我认为lambda不可能比bind更好。我能想到的唯一合理解释是编译器优化lambda循环中的后续函数求值
你怎么解释
我认为lambda不可能比bind更好
这是一种相当先入为主的观念
lambda被绑定到编译器内部,因此可能会发现额外的优化机会。此外,它们旨在避免效率低下
但是,这里可能没有编译器优化技巧。可能的罪魁祸首是bind的参数,bind(&decltype(result)::eval,&result)
。您正在向成员函数(PTMF)和对象传递指针。与lambda类型不同,PTMF不捕获实际调用的函数;它只包含函数签名(参数和返回类型)。慢循环使用间接分支函数调用,因为编译器无法通过常量传播解析函数指针
如果将成员
eval()
重命名为operator()()
,并去掉bind
,那么显式对象的行为本质上将类似于lambda,性能差异应该消失。我已经对其进行了测试。我的结果显示,Lambda实际上比bind快
这是代码(请不要看样式):
我在Visual Studio Enterprise 2015下编译了它,在发布模式下进行了完全优化(/Ox),在调试模式下进行了禁用的优化。结果证实lambda比我的笔记本电脑(Dell Inspiron 7537,Intel Core i7-4510U 2.00GHz,8GB RAM)上的bind更快
有人能在您的计算机上验证这一点吗?lambda主体可以内联。绑定表达式可能不可内联。检查机器代码以确保正确。还可以试试
Fun&&Fun
。还有一件事:我是在学习表达式模板时想到的。我有一个计算图,可以在运行时或编译时计算。所描述的行为发生在编译时求值的情况下,而不是在运行时。您确实需要检查程序集以了解发生了什么。这不是简单的timeExec
与timeExec
?这绝对不是苹果对苹果的比较。我很好奇如果你做timeExec(“Lambda求值”[&]{result.eval();})
OnIntel(R)Core(TM)i7-6820HQ CPU@2.70GHz-32gbram-clang++-4.0libstdc++Lambda是bind使用时间的1/3。用-O3编译的bind和lambda几乎相等,但bind仍然慢5%。在没有循环展开的情况下编译会确认结果。这是否适用于所有绑定变体,如std::mem_fn
?有时,在std::all\u of
或std::any\u of
中使用std::mem\fn
要短得多。
const int TIMES = 10000;
timeExec<TIMES>("Bind evaluation", bind(&decltype(result)::eval, &result));
timeExec<1>("Lambda evaluation", [&]() {
for(int i = 0; i < TIMES; ++i) {
result.eval();
}
});
Bind evaluation: 0.355158 ms.
Lambda evaluation: 0.014414 ms.
#include <iostream>
#include <functional>
#include <chrono>
using namespace std;
using namespace chrono;
using namespace placeholders;
typedef void SumDataBlockEventHandler(uint8_t data[], uint16_t len);
class SpeedTest {
uint32_t sum = 0;
uint8_t i = 0;
void SumDataBlock(uint8_t data[], uint16_t len) {
for (i = 0; i < len; i++) {
sum += data[i];
}
}
public:
function<SumDataBlockEventHandler> Bind() {
return bind(&SpeedTest::SumDataBlock, this, _1, _2);
}
function<SumDataBlockEventHandler> Lambda() {
return [this](auto data, auto len)
{
SumDataBlock(data, len);
};
}
};
int main()
{
SpeedTest test;
function<SumDataBlockEventHandler> testF;
uint8_t data[] = { 0,1,2,3,4,5,6,7 };
#if _DEBUG
const uint32_t testFcallCount = 1000000;
#else
const uint32_t testFcallCount = 100000000;
#endif
uint32_t callsCount, whileCount = 0;
auto begin = high_resolution_clock::now();
auto end = begin;
while (whileCount++ < 10) {
testF = test.Bind();
begin = high_resolution_clock::now();
callsCount = 0;
while (callsCount++ < testFcallCount)
testF(data, 8);
end = high_resolution_clock::now();
cout << testFcallCount << " calls of binded function: " << duration_cast<nanoseconds>(end - begin).count() << "ns" << endl;
testF = test.Lambda();
begin = high_resolution_clock::now();
callsCount = 0;
while (callsCount++ < testFcallCount)
testF(data, 8);
end = high_resolution_clock::now();
cout << testFcallCount << " calls of lambda function: " << duration_cast<nanoseconds>(end - begin).count() << "ns" << endl << endl;
}
system("pause");
}
100000000 calls of binded function: 1846298524ns
100000000 calls of lambda function: 1048086461ns
100000000 calls of binded function: 1259759880ns
100000000 calls of lambda function: 1032256243ns
100000000 calls of binded function: 1264817832ns
100000000 calls of lambda function: 1039052353ns
100000000 calls of binded function: 1263404007ns
100000000 calls of lambda function: 1031216018ns
100000000 calls of binded function: 1275305794ns
100000000 calls of lambda function: 1041313446ns
100000000 calls of binded function: 1256565304ns
100000000 calls of lambda function: 1031961675ns
100000000 calls of binded function: 1248132135ns
100000000 calls of lambda function: 1033890224ns
100000000 calls of binded function: 1252277130ns
100000000 calls of lambda function: 1042336736ns
100000000 calls of binded function: 1250320869ns
100000000 calls of lambda function: 1046529458ns