Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/c/66.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
C 向sum变量进行8路加法。这使它下降到2.03秒_C_Performance_Optimization - Fatal编程技术网

C 向sum变量进行8路加法。这使它下降到2.03秒

C 向sum变量进行8路加法。这使它下降到2.03秒,c,performance,optimization,C,Performance,Optimization,在sum变量中,我将其加倍为16路加法,这使它下降到1.91秒 在sum变量中,我将其加倍到32路加法。时间上升到2.08秒 正如@kcraigie所建议的那样,我切换到指针方法。使用-O3时,时间为6.01秒。(我很惊讶!) 这将运行时间缩短到

在sum变量中,我将其加倍为16路加法,这使它下降到1.91秒

  • 在sum变量中,我将其加倍到32路加法。时间上升到2.08秒

  • 正如@kcraigie所建议的那样,我切换到指针方法。使用-O3时,时间为6.01秒。(我很惊讶!)


    这将运行时间缩短到<0.01秒。;-)如果你没有被F-stick击中,它就是赢家。

    我尝试过分组。在我的机器上,使用我的
    gcc
    ,我发现以下方法效果最好:

        for (j = 0; j < ARRAY_SIZE; j += 16) {
            sum = sum +
                  (array[j   ] + array[j+ 1]) +
                  (array[j+ 2] + array[j+ 3]) +
                  (array[j+ 4] + array[j+ 5]) +
                  (array[j+ 6] + array[j+ 7]) +
                  (array[j+ 8] + array[j+ 9]) +
                  (array[j+10] + array[j+11]) +
                  (array[j+12] + array[j+13]) +
                  (array[j+14] + array[j+15]);
        }
    

    这使用了40个总展开量,分为两组,每组20个,交替使用预递增的归纳变量来打破依赖关系,以及一个后测试循环。同样,您可以尝试使用括号分组,以针对编译器和平台进行微调。

    这可能是由于现代CPU优化向量操作的方式。请尝试将
    -O3
    添加到编译器选项中。这将启用更多的编译时优化。@pvg即使使用默认的优化级别,我仍然希望
    sum0
    sum1
    等保留在寄存器中,所以我不相信你的论点。所执行的转换是有意义的,只要这些变量保存在寄存器中,我就不会期望它使事情变得更糟。事实上,它应该减少循环中的依赖关系。哇,这个任务真的很可怕。他们基本上是强迫你做一些你永远不应该做的事情。如果他们要求使用
    -O3
    ,然后说“现在看看是否可以对代码进行一些更改以进一步改进”,这会更有意义。提示:强制将数组分配到缓存线(x86上的IIRC 64字节)。最差的演员手动对齐。太糟糕了,你不允许使用SSE/AVX,因为这涉及到某种汇编指令(尽管它们可以作为内部函数使用)。不过,正如评论中所说,这是一个严格的-O0赋值。当我开始写这篇文章时,还没有提到这一点。;-)我这边的测试让我觉得16路加法是目前最好的结果。我只需要把它推到目标区域。3~我运行了你的代码(以及使用不同分组的版本),测试的每个版本(包括那个版本)平均得到5.5秒。我目前的版本是展开到20,然后像这样把它们和它们的邻居一起分组,然后一次又一次地和他们的邻居一起分组,直到剩下的一个操作达到大约5.4-5.1。听起来你几乎要尽可能多地从中挤出。20的展开量很有趣。正常情况下,我想用2的幂展开,但在这种情况下,没有理由不用其他数量展开,这些数量是总行程数的因素。我开始想知道你的老师做了什么,使他们可靠地达到了5以下,而你似乎无法达到,尽管所有的摆弄。我们几乎是在猜测不同平台上的gcc的不同版本的codegen。我有一个新的循环展开,它看起来有很大的不同。我已经更新了我的帖子以包含它。请看一下我帖子底部的新循环。我建议尝试一下,然后尝试添加括号,看看是否有不同。
    int     j;
    
        for (j = 0; j < ARRAY_SIZE; j += 8) {
            sum0 += array[j] + array[j+1]; 
            sum1 += array[j+2] + array[j+3];
            sum2 += array[j+4] + array[j+5]; 
            sum3 += array[j+6] + array[j+7];
    
        #include <stdio.h>
    #include <stdlib.h>
    
    // You are only allowed to make changes to this code as specified by the comments in it.
    
    // The code you submit must have these two values.
    #define N_TIMES     600000
    #define ARRAY_SIZE   10000
    
    int main(void)
    {
        double  *array = calloc(ARRAY_SIZE, sizeof(double));
        double  sum = 0;
        int     i;
    
        // You can add variables between this comment ...
    
    //  double sum0 = 0;
    //  double sum1 = 0;
    //  double sum2 = 0;
    //  double sum3 = 0;
    
        // ... and this one.
    
        // Please change 'your name' to your actual name.
        printf("CS201 - Asgmt 4 - ACTUAL NAME\n");
    
        for (i = 0; i < N_TIMES; i++) {
    
            // You can change anything between this comment ...
    
            int     j;
    
            for (j = 0; j < ARRAY_SIZE; j += 8) {
                sum += array[j] + array[j+1] + array[j+2] + array[j+3] + array[j+4] + array[j+5] +  array[j+6] + array[j+7];
            }
    
            // ... and this one. But your inner loop must do the same
            // number of additions as this one does.
    
            }
    
        // You can add some final code between this comment ...
    //  sum = sum0 + sum1 + sum2 + sum3;
        // ... and this one.
    
        return 0;
    }
    
        #include <stdio.h>
    #include <stdlib.h>
    
    // You are only allowed to make changes to this code as specified by the comments in it.
    
    // The code you submit must have these two values.
    #define N_TIMES     600000
    #define ARRAY_SIZE   10000
    
    int main(void)
    {
        double  *array = calloc(ARRAY_SIZE, sizeof(double));
        double  sum = 0;
        int     i;
    
        // You can add variables between this comment ...
    
        double sum0 = 0;
        double sum1 = 0;
        double sum2 = 0;
        double sum3 = 0;
    
        // ... and this one.
    
        // Please change 'your name' to your actual name.
        printf("CS201 - Asgmt 4 - ACTUAL NAME\n");
    
        for (i = 0; i < N_TIMES; i++) {
    
            // You can change anything between this comment ...
    
            int     j;
    
            for (j = 0; j < ARRAY_SIZE; j += 8) {
                sum0 += array[j] + array[j+1]; 
                sum1 += array[j+2] + array[j+3];
                sum2 += array[j+4] + array[j+5]; 
                sum3 += array[j+6] + array[j+7];
            }
    
            // ... and this one. But your inner loop must do the same
            // number of additions as this one does.
    
            }
    
        // You can add some final code between this comment ...
        sum = sum0 + sum1 + sum2 + sum3;
        // ... and this one.
    
        return 0;
    }
    
    int     j;
            for (j = 0; j < ARRAY_SIZE; j += 50) {
                sum +=(((((((array[j] + array[j+1]) + (array[j+2] + array[j+3])) +
                        ((array[j+4] + array[j+5]) + (array[j+6] + array[j+7]))) + 
                        (((array[j+8] + array[j+9]) + (array[j+10] + array[j+11])) +
                        ((array[j+12] + array[j+13]) + (array[j+14] + array[j+15])))) +
                        ((((array[j+16] + array[j+17]) + (array[j+18] + array[j+19]))))) +
                        (((((array[j+20] + array[j+21]) + (array[j+22] + array[j+23])) +
                        ((array[j+24] + array[j+25]) + (array[j+26] + array[j+27]))) + 
                        (((array[j+28] + array[j+29]) + (array[j+30] + array[j+31])) +
                        ((array[j+32] + array[j+33]) + (array[j+34] + array[j+35])))) +
                        ((((array[j+36] + array[j+37]) + (array[j+38] + array[j+39])))))) + 
                        ((((array[j+40] + array[j+41]) + (array[j+42] + array[j+43])) +
                        ((array[j+44] + array[j+45]) + (array[j+46] + array[j+47]))) + 
                        (array[j+48] + array[j+49])));
            }
    
    register double * p;
    for (p = array; p < array + ARRAY_SIZE; ++p) {
        sum += *p;
    }
    
    int ntimes = 0;
    
    // ... and this one.
    ...
        // You can change anything between this comment ...
    
                if (ntimes++ == 0) {
    
        for (j = 0; j < ARRAY_SIZE; j += 16) {
            sum = sum +
                  (array[j   ] + array[j+ 1]) +
                  (array[j+ 2] + array[j+ 3]) +
                  (array[j+ 4] + array[j+ 5]) +
                  (array[j+ 6] + array[j+ 7]) +
                  (array[j+ 8] + array[j+ 9]) +
                  (array[j+10] + array[j+11]) +
                  (array[j+12] + array[j+13]) +
                  (array[j+14] + array[j+15]);
        }
    
        int     j1, j2;
    
        j1 = 0;
        do {
            j2 = j1 + 20;
            sum = sum +
                  (array[j1   ] + array[j1+ 1]) +
                  (array[j1+ 2] + array[j1+ 3]) +
                  (array[j1+ 4] + array[j1+ 5]) +
                  (array[j1+ 6] + array[j1+ 7]) +
                  (array[j1+ 8] + array[j1+ 9]) +
                  (array[j1+10] + array[j1+11]) +
                  (array[j1+12] + array[j1+13]) +
                  (array[j1+14] + array[j1+15]) +
                  (array[j1+16] + array[j1+17]) +
                  (array[j1+18] + array[j1+19]);
            j1 = j2 + 20;
            sum = sum +
                  (array[j2   ] + array[j2+ 1]) +
                  (array[j2+ 2] + array[j2+ 3]) +
                  (array[j2+ 4] + array[j2+ 5]) +
                  (array[j2+ 6] + array[j2+ 7]) +
                  (array[j2+ 8] + array[j2+ 9]) +
                  (array[j2+10] + array[j2+11]) +
                  (array[j2+12] + array[j2+13]) +
                  (array[j2+14] + array[j2+15]) +
                  (array[j2+16] + array[j2+17]) +
                  (array[j2+18] + array[j2+19]);
        }
        while (j1 < ARRAY_SIZE);