Python xtensor和xsimd:提高降低成本的性能
我试图在缩减操作(例如元素之和)上获得与NumPy相同的性能 我支持并行计算,但没有效果 以下是基准代码:Python xtensor和xsimd:提高降低成本的性能,python,c++,numpy,simd,xtensor,Python,C++,Numpy,Simd,Xtensor,我试图在缩减操作(例如元素之和)上获得与NumPy相同的性能 我支持并行计算,但没有效果 以下是基准代码: #include <iostream> #include "xtensor/xreducer.hpp" #include "xtensor/xrandom.hpp" #include <ctime> using namespace std; pair<double, double> timeit(int size, int n=30){
#include <iostream>
#include "xtensor/xreducer.hpp"
#include "xtensor/xrandom.hpp"
#include <ctime>
using namespace std;
pair<double, double> timeit(int size, int n=30){
double total_clocks = 0;
double total_sum = 0;
for (int i=0;i<n;i++){
xt::xtensor<double, 1> a = xt::random::rand({size}, 0., 1.);
int start = clock();
double s = xt::sum(a, xt::evaluation_strategy::immediate)();
int end = clock();
total_sum += s; total_clocks += end-start;
}
return pair<double, double>(total_clocks/CLOCKS_PER_SEC/n, total_sum);
}
int main(int argc, char *argv[])
{
for (int i=5;i<8;i++){
int size = pow(10, i);
pair<double, double> ret = timeit(size);
cout<<"size: "<<size<< " \t " <<ret.first<<" sec\t"<<ret.second<<endl;
}
return 0;
}
顺便说一下,使用numpy的相同操作:
$ python bench.py
size: 100000 0.000030 sec
size: 1000000 0.000430 sec
size: 10000000 0.005144 sec
大约快4倍强>
设置
- Ubuntu 18.04
- 核心i7处理器
- 软件包的最新版本
应启用
-mavx2
和-ffast math
标志
$ g++ -mavx2 -ffast-math -DXTENSOR_USE_XSIMD -O3 -I/home/--user--/install_path/include ./bench.cpp -o a && ./a
size: 100000 3.489e-05 sec 4.99932e+06
size: 1000000 0.00050792 sec 4.99989e+07
size: 10000000 0.00544542 sec 4.99997e+08
谢谢你
$ python bench.py
size: 100000 0.000030 sec
size: 1000000 0.000430 sec
size: 10000000 0.005144 sec
$ g++ -mavx2 -ffast-math -DXTENSOR_USE_XSIMD -O3 -I/home/--user--/install_path/include ./bench.cpp -o a && ./a
size: 100000 3.489e-05 sec 4.99932e+06
size: 1000000 0.00050792 sec 4.99989e+07
size: 10000000 0.00544542 sec 4.99997e+08