C++ 元素向量重乘
对于两个n元素整数向量的乘法,在代码所需的时间方面,哪种方法比以下方法更快: 编辑:C++ 元素向量重乘,c++,vector-multiplication,C++,Vector Multiplication,对于两个n元素整数向量的乘法,在代码所需的时间方面,哪种方法比以下方法更快: 编辑: 收到了好几个好主意。我必须检查每一个,看看哪一个是最好的。当然,每个回答都告诉我一些新的东西。快速回答:这可能是最快的解决方案。 详细回答:fast有两种定义。 如果您正在寻找一个快速而简单的解决方案,上面的代码应该很适合。 如果您正在寻找快速运行的代码,我不知道使用while循环或使用库/包重新实现for循环是否会有所帮助。 祝你好运 #include <valarray> int main()
收到了好几个好主意。我必须检查每一个,看看哪一个是最好的。当然,每个回答都告诉我一些新的东西。快速回答:这可能是最快的解决方案。 详细回答:fast有两种定义。 如果您正在寻找一个快速而简单的解决方案,上面的代码应该很适合。 如果您正在寻找快速运行的代码,我不知道使用while循环或使用库/包重新实现for循环是否会有所帮助。 祝你好运
#include <valarray>
int main() {
std::valarray<int> a {1, 2, 3, 4, 5};
std::valarray<int> b {3, 4, 5, 6, 7};
auto c = (a * b).sum();
}
ValSoad可以使用SIMD和其他矢量指令,虽然在一般实现中似乎经常被忽略。
< P>只是使用C++标准库:#include<algorithm>
#include<iostream>
#include<vector>
int main() {
std::vector<double> x = {1, 2, 3};
std::vector<double> y = {4, 5, 6};
double xy = std::inner_product(x.begin(), x.end(), y.begin(), 0);
std::cout<<"inner product: "<<xy<<std::endl;
return 0;
}
因此,使用手工编码方法的速度优势似乎只有在使用较小的数组时才起作用。通过10000个标记,我会认为它们的运行时间是相同的,但是我更喜欢算法的方法,因为它更容易编写和维护,并且它可以从库的更新中受益。
通常,这个定时信息应该是有把握的。
< P>我唯一能看到的在C++中更快的方法是修复循环依赖性,这意味着每次迭代必须等到前一个临时值对求和可用。这可以通过展开和使用几个累积变量来实现:int t0=0, t1=0, t2=0, t3=0;
for (int ii = 0; ii < n; ii += 4) {
t0 += a[ii]*b[ii];
t1 += a[ii+1]*b[ii+1];
t2 += a[ii+2]*b[ii+2];
t3 += a[ii+3]*b[ii+3];
}
int temp = t0 + t1 + t2 + t3;
现代处理器可以在每个周期执行多个操作,但前提是没有依赖关系。在我的系统中,这产生了大约20%的改进。注意:n必须是4的倍数,或者您需要添加一个循环尾声来完成剩余的元素。测试和测量!我不知道4是不是正确的展开量
通过调用处理器专用SIMD内联,可以获得更大的改进,但这不是标准的C++。< /P>它没有比ON更快。这比for循环快吗?除了整洁和易于管理之外,谢谢。inner_product是一个非常有用的建议:不仅对于我提出的特定问题,而且对于其他可能使用inner_product的整洁操作:while循环通常更快吗?关于库/包:我最终将研究计算/线性代数库,包括这里其他人曾经建议的库。问这个问题的部分原因是想知道哪些库有bames53建议的这样的工具。这是一个很好的发现。好主意。我可以使用valarray的此功能和其他功能,例如用于我的计算。
#include<algorithm>
#include<iostream>
#include<vector>
#include<random>
#include<boost/timer/timer.hpp>
int main(int argc, char* argv[]) {
// get the desired number of elements
if(argc!=2) {
std::cerr<<"usage: "<<argv[0]<<" N"<<std::endl;
return EXIT_FAILURE;
}
int N = std::stoi(argv[1]);
// set-up the random number generator
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_real_distribution<> dis(-100, 100);
// prepare the vectors
std::vector<double> x, y;
// fill the vectors with random numbers
auto rgen = [&dis, &gen]() { return dis(gen); };
std::generate_n(std::back_inserter(x), N, rgen);
std::generate_n(std::back_inserter(y), N, rgen);
// Heat-up the cache (try commenting-out this line and you'll see
// that the time increases for whatever algorithm you put firts)
double xy = std::inner_product(x.begin(), x.end(), y.begin(), 0.0);
std::cout<<"heated-up value: "<<xy<<std::endl;
{ // start of new timing scope
// write a message to the assembly source
boost::timer::auto_cpu_timer t;
asm("##### START OF ALGORITHMIC APPROACH #####");
double xy = std::inner_product(x.begin(), x.end(), y.begin(), 0.0);
asm("##### END OF ALGORITHMIC APPROACH #####");
std::cout<<"algorithmic value: "<<xy<<std::endl<<"timing info: ";
} // end of timing scope
{ // start of new timing scope
// write a message to the assembly source
boost::timer::auto_cpu_timer t;
asm("##### START OF HAND-CODED APPROACH #####");
double tmp = 0.0;
for(int k=0; k<N; k++) {
tmp += x[k] * y[k];
}
asm("##### END OF HAND-CODED APPROACH #####");
std::cout<<"hand-coded value: "<<tmp<<std::endl<<"timing info: ";
} // end of timing scope
return EXIT_SUCCESS;
}
[11:01:58 ~/research/c++] ./a.out 10
heated-up value: 8568.75
algorithmic value: 8568.75
timing info: 0.000006s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%)
hand-coded value: 8568.75
timing info: 0.000004s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%)
[11:01:59 ~/research/c++] ./a.out 100
heated-up value: -13072.2
algorithmic value: -13072.2
timing info: 0.000006s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%)
hand-coded value: -13072.2
timing info: 0.000004s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%)
[11:02:03 ~/research/c++] ./a.out 1000
heated-up value: 80389.1
algorithmic value: 80389.1
timing info: 0.000010s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%)
hand-coded value: 80389.1
timing info: 0.000007s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%)
[11:02:04 ~/research/c++] ./a.out 10000
heated-up value: 89753.7
algorithmic value: 89753.7
timing info: 0.000041s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%)
hand-coded value: 89753.7
timing info: 0.000039s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%)
[11:02:05 ~/research/c++] ./a.out 100000
heated-up value: -461750
algorithmic value: -461750
timing info: 0.000292s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%)
hand-coded value: -461750
timing info: 0.000282s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%)
[11:02:07 ~/research/c++] ./a.out 1000000
heated-up value: 2.52643e+06
algorithmic value: 2.52643e+06
timing info: 0.002702s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%)
hand-coded value: 2.52643e+06
timing info: 0.002660s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%)
[11:02:09 ~/research/c++] ./a.out 10000000
heated-up value: 6.04128e+06
algorithmic value: 6.04128e+06
timing info: 0.026557s wall, 0.030000s user + 0.000000s system = 0.030000s CPU (113.0%)
hand-coded value: 6.04128e+06
timing info: 0.026335s wall, 0.030000s user + 0.000000s system = 0.030000s CPU (113.9%)
[11:02:11 ~/research/c++] ./a.out 100000000
heated-up value: 2.27043e+07
algorithmic value: 2.27043e+07
timing info: 0.264547s wall, 0.270000s user + 0.000000s system = 0.270000s CPU (102.1%)
hand-coded value: 2.27043e+07
timing info: 0.264346s wall, 0.260000s user + 0.000000s system = 0.260000s CPU (98.4%)
int t0=0, t1=0, t2=0, t3=0;
for (int ii = 0; ii < n; ii += 4) {
t0 += a[ii]*b[ii];
t1 += a[ii+1]*b[ii+1];
t2 += a[ii+2]*b[ii+2];
t3 += a[ii+3]*b[ii+3];
}
int temp = t0 + t1 + t2 + t3;