C 向sum变量进行8路加法。这使它下降到2.03秒
在sum变量中,我将其加倍为16路加法,这使它下降到1.91秒C 向sum变量进行8路加法。这使它下降到2.03秒,c,performance,optimization,C,Performance,Optimization,在sum变量中,我将其加倍为16路加法,这使它下降到1.91秒 在sum变量中,我将其加倍到32路加法。时间上升到2.08秒 正如@kcraigie所建议的那样,我切换到指针方法。使用-O3时,时间为6.01秒。(我很惊讶!) 这将运行时间缩短到
这将运行时间缩短到<0.01秒。;-)如果你没有被F-stick击中,它就是赢家。我尝试过分组。在我的机器上,使用我的
gcc
,我发现以下方法效果最好:
for (j = 0; j < ARRAY_SIZE; j += 16) {
sum = sum +
(array[j ] + array[j+ 1]) +
(array[j+ 2] + array[j+ 3]) +
(array[j+ 4] + array[j+ 5]) +
(array[j+ 6] + array[j+ 7]) +
(array[j+ 8] + array[j+ 9]) +
(array[j+10] + array[j+11]) +
(array[j+12] + array[j+13]) +
(array[j+14] + array[j+15]);
}
这使用了40个总展开量,分为两组,每组20个,交替使用预递增的归纳变量来打破依赖关系,以及一个后测试循环。同样,您可以尝试使用括号分组,以针对编译器和平台进行微调。这可能是由于现代CPU优化向量操作的方式。请尝试将
-O3
添加到编译器选项中。这将启用更多的编译时优化。@pvg即使使用默认的优化级别,我仍然希望sum0
,sum1
等保留在寄存器中,所以我不相信你的论点。所执行的转换是有意义的,只要这些变量保存在寄存器中,我就不会期望它使事情变得更糟。事实上,它应该减少循环中的依赖关系。哇,这个任务真的很可怕。他们基本上是强迫你做一些你永远不应该做的事情。如果他们要求使用-O3
,然后说“现在看看是否可以对代码进行一些更改以进一步改进”,这会更有意义。提示:强制将数组分配到缓存线(x86上的IIRC 64字节)。最差的演员手动对齐。太糟糕了,你不允许使用SSE/AVX,因为这涉及到某种汇编指令(尽管它们可以作为内部函数使用)。不过,正如评论中所说,这是一个严格的-O0赋值。当我开始写这篇文章时,还没有提到这一点。;-)我这边的测试让我觉得16路加法是目前最好的结果。我只需要把它推到目标区域。3~我运行了你的代码(以及使用不同分组的版本),测试的每个版本(包括那个版本)平均得到5.5秒。我目前的版本是展开到20,然后像这样把它们和它们的邻居一起分组,然后一次又一次地和他们的邻居一起分组,直到剩下的一个操作达到大约5.4-5.1。听起来你几乎要尽可能多地从中挤出。20的展开量很有趣。正常情况下,我想用2的幂展开,但在这种情况下,没有理由不用其他数量展开,这些数量是总行程数的因素。我开始想知道你的老师做了什么,使他们可靠地达到了5以下,而你似乎无法达到,尽管所有的摆弄。我们几乎是在猜测不同平台上的gcc的不同版本的codegen。我有一个新的循环展开,它看起来有很大的不同。我已经更新了我的帖子以包含它。请看一下我帖子底部的新循环。我建议尝试一下,然后尝试添加括号,看看是否有不同。
int j;
for (j = 0; j < ARRAY_SIZE; j += 8) {
sum0 += array[j] + array[j+1];
sum1 += array[j+2] + array[j+3];
sum2 += array[j+4] + array[j+5];
sum3 += array[j+6] + array[j+7];
#include <stdio.h>
#include <stdlib.h>
// You are only allowed to make changes to this code as specified by the comments in it.
// The code you submit must have these two values.
#define N_TIMES 600000
#define ARRAY_SIZE 10000
int main(void)
{
double *array = calloc(ARRAY_SIZE, sizeof(double));
double sum = 0;
int i;
// You can add variables between this comment ...
// double sum0 = 0;
// double sum1 = 0;
// double sum2 = 0;
// double sum3 = 0;
// ... and this one.
// Please change 'your name' to your actual name.
printf("CS201 - Asgmt 4 - ACTUAL NAME\n");
for (i = 0; i < N_TIMES; i++) {
// You can change anything between this comment ...
int j;
for (j = 0; j < ARRAY_SIZE; j += 8) {
sum += array[j] + array[j+1] + array[j+2] + array[j+3] + array[j+4] + array[j+5] + array[j+6] + array[j+7];
}
// ... and this one. But your inner loop must do the same
// number of additions as this one does.
}
// You can add some final code between this comment ...
// sum = sum0 + sum1 + sum2 + sum3;
// ... and this one.
return 0;
}
#include <stdio.h>
#include <stdlib.h>
// You are only allowed to make changes to this code as specified by the comments in it.
// The code you submit must have these two values.
#define N_TIMES 600000
#define ARRAY_SIZE 10000
int main(void)
{
double *array = calloc(ARRAY_SIZE, sizeof(double));
double sum = 0;
int i;
// You can add variables between this comment ...
double sum0 = 0;
double sum1 = 0;
double sum2 = 0;
double sum3 = 0;
// ... and this one.
// Please change 'your name' to your actual name.
printf("CS201 - Asgmt 4 - ACTUAL NAME\n");
for (i = 0; i < N_TIMES; i++) {
// You can change anything between this comment ...
int j;
for (j = 0; j < ARRAY_SIZE; j += 8) {
sum0 += array[j] + array[j+1];
sum1 += array[j+2] + array[j+3];
sum2 += array[j+4] + array[j+5];
sum3 += array[j+6] + array[j+7];
}
// ... and this one. But your inner loop must do the same
// number of additions as this one does.
}
// You can add some final code between this comment ...
sum = sum0 + sum1 + sum2 + sum3;
// ... and this one.
return 0;
}
int j;
for (j = 0; j < ARRAY_SIZE; j += 50) {
sum +=(((((((array[j] + array[j+1]) + (array[j+2] + array[j+3])) +
((array[j+4] + array[j+5]) + (array[j+6] + array[j+7]))) +
(((array[j+8] + array[j+9]) + (array[j+10] + array[j+11])) +
((array[j+12] + array[j+13]) + (array[j+14] + array[j+15])))) +
((((array[j+16] + array[j+17]) + (array[j+18] + array[j+19]))))) +
(((((array[j+20] + array[j+21]) + (array[j+22] + array[j+23])) +
((array[j+24] + array[j+25]) + (array[j+26] + array[j+27]))) +
(((array[j+28] + array[j+29]) + (array[j+30] + array[j+31])) +
((array[j+32] + array[j+33]) + (array[j+34] + array[j+35])))) +
((((array[j+36] + array[j+37]) + (array[j+38] + array[j+39])))))) +
((((array[j+40] + array[j+41]) + (array[j+42] + array[j+43])) +
((array[j+44] + array[j+45]) + (array[j+46] + array[j+47]))) +
(array[j+48] + array[j+49])));
}
register double * p;
for (p = array; p < array + ARRAY_SIZE; ++p) {
sum += *p;
}
int ntimes = 0;
// ... and this one.
...
// You can change anything between this comment ...
if (ntimes++ == 0) {
for (j = 0; j < ARRAY_SIZE; j += 16) {
sum = sum +
(array[j ] + array[j+ 1]) +
(array[j+ 2] + array[j+ 3]) +
(array[j+ 4] + array[j+ 5]) +
(array[j+ 6] + array[j+ 7]) +
(array[j+ 8] + array[j+ 9]) +
(array[j+10] + array[j+11]) +
(array[j+12] + array[j+13]) +
(array[j+14] + array[j+15]);
}
int j1, j2;
j1 = 0;
do {
j2 = j1 + 20;
sum = sum +
(array[j1 ] + array[j1+ 1]) +
(array[j1+ 2] + array[j1+ 3]) +
(array[j1+ 4] + array[j1+ 5]) +
(array[j1+ 6] + array[j1+ 7]) +
(array[j1+ 8] + array[j1+ 9]) +
(array[j1+10] + array[j1+11]) +
(array[j1+12] + array[j1+13]) +
(array[j1+14] + array[j1+15]) +
(array[j1+16] + array[j1+17]) +
(array[j1+18] + array[j1+19]);
j1 = j2 + 20;
sum = sum +
(array[j2 ] + array[j2+ 1]) +
(array[j2+ 2] + array[j2+ 3]) +
(array[j2+ 4] + array[j2+ 5]) +
(array[j2+ 6] + array[j2+ 7]) +
(array[j2+ 8] + array[j2+ 9]) +
(array[j2+10] + array[j2+11]) +
(array[j2+12] + array[j2+13]) +
(array[j2+14] + array[j2+15]) +
(array[j2+16] + array[j2+17]) +
(array[j2+18] + array[j2+19]);
}
while (j1 < ARRAY_SIZE);