减少arm中的指令数_Arm - Fatal编程技术网

减少arm中的指令数

arm

减少arm中的指令数,arm,Arm,我有一个程序，我正在为学校工作，其目的是添加两个矩阵，并将其结果存储在第三个矩阵中。目前，当使用驱动程序（一个.o文件）运行时，指令数是1003034420，但它需要在10亿以下。然而，我不知道该怎么做，因为我已经考虑了我使用的所有指令，而且所有指令似乎都是程序运行的强制性指令请注意，我不能在此时使用循环展开来减少指令的数量，因为这是稍后才出现的节目如下： /* This function has 5 parameters, and the declaration in the C-l

我有一个程序，我正在为学校工作，其目的是添加两个矩阵，并将其结果存储在第三个矩阵中。目前，当使用驱动程序（一个.o文件）运行时，指令数是1003034420，但它需要在10亿以下。然而，我不知道该怎么做，因为我已经考虑了我使用的所有指令，而且所有指令似乎都是程序运行的强制性指令

请注意，我不能在此时使用循环展开来减少指令的数量，因为这是稍后才出现的

节目如下：

/* This function has 5 parameters, and the declaration in the
   C-language would look like:

   void matadd (int **C, int **A, int **B, int height, int width)

   C, A, B, and height will be passed in r0-r3, respectively, and
   width will be passed on the stack. */

.arch armv7-a
.text
.align  2
.global matadd
.syntax unified
.arm
matadd:
   push  {r4, r5, r6, r7, r8, r9, r10, r11, lr}
   ldr   r4, [sp, #36]                 @ load width into r4
   mov   r5, #0                        @ r5 is current row index
row_loop: 
   mov   r6, #0                        @ r6 is the col, reset it for each new row
   cmp   r5, r3                        @ compare row with height
   beq   end_loops                     @ we have finished all of the rows
   ldr   r11, [r0, r5, lsl #2]         @ r11 is the current row array of C
   ldr   r7, [r1, r5, lsl #2]          @ r7 is the current row array of A
   ldr   r8, [r2, r5, lsl #2]          @ r8 is the current row array of B
                                       @ the left shifts are so that we skip
                                       @ 4 bytes since these are ints
                                       @ these do not change registers
col_loop:   
   cmp   r6, r4                        @ compare col with width
   beq   end_col                       @ we have finished this col
   ldr   r9, [r7, r6, lsl #2]          @ r9 is cur_row[col] of A
   ldr   r10, [r8, r6, lsl #2]         @ r10 is cur_row[col] of B
   add   r9, r9, r10                   @ r8 is A[row][col] + B[row][col]
   str   r9, [r11, r6, lsl #2]         @ store result of addition in C[row][col]
   add   r6, r6, #1                    @ increment col
   b     col_loop                      @ get next entry
end_col:
   add   r5, r5, #1                    @ increment row
   b     row_loop                      @ get next row
end_loops:   
   pop   {r4, r5, r6, r7, r8, r9, r10, r11, pc}

我想一定有一些结合cmp和b之类的指令，但我似乎找不到。有关于如何减少指令数量的指针吗？

您想从内部循环中删除无条件分支

loop_start:
    cmp x, y
    beq loop_exit

    blah blah blah

    b loop_start
loop_exit:

请注意，每次通过循环时，都有一个无条件分支（

b loop\u start

）。通过内联分支目标直到下一个条件分支，避免分支

loop_start:
    cmp x, y
    beq loop_exit

loop_middle:
    blah blah blah

    ; was "b loop_start" but we just copy the instructions
    ; starting at "loop_start" up to the conditional branch

    cmp x, y
    beq loop_exit

    ; and then jump to the instruction after the inlined portion
    b loop_middle
loop_exit:

此时，

beq

只是分支上的一个分支，因此可以用反向检测分支替换它

loop_start:
    cmp x, y
    beq loop_exit

loop_middle:
    blah blah blah

    cmp x, y

    ; "beq loop_exit" followed by "b loop_middle" is equivalent to this
    bne loop_middle

loop_exit:

在您的代码中有两个优化的机会

（在提交解决方案时，不要忘记引用此网页，以避免学术欺诈的指控。）

您希望从内部循环中删除无条件分支

loop_start:
    cmp x, y
    beq loop_exit

    blah blah blah

    b loop_start
loop_exit:

请注意，每次通过循环时，都有一个无条件分支（

b loop\u start

）。通过内联分支目标直到下一个条件分支，避免分支

loop_start:
    cmp x, y
    beq loop_exit

loop_middle:
    blah blah blah

    ; was "b loop_start" but we just copy the instructions
    ; starting at "loop_start" up to the conditional branch

    cmp x, y
    beq loop_exit

    ; and then jump to the instruction after the inlined portion
    b loop_middle
loop_exit:

此时，

beq

只是分支上的一个分支，因此可以用反向检测分支替换它

loop_start:
    cmp x, y
    beq loop_exit

loop_middle:
    blah blah blah

    cmp x, y

    ; "beq loop_exit" followed by "b loop_middle" is equivalent to this
    bne loop_middle

loop_exit:

在您的代码中有两个优化的机会

（提交解决方案时，不要忘记引用此网页，以避免学术欺诈的指控。）

诀窍是从内部循环中删除无条件分支。展开到下一个条件branch@RaymondChen你是什么意思？即使我在内部循环的末尾使用bne而不是b，它仍然需要每次进行比较。我有点明白你的意思，我可以以某种方式组合这两个分支，但我不知道如何做，因为它们必须分支到不同的位置？诀窍是从内部循环中移除无条件分支。展开到下一个条件branch@RaymondChen你是什么意思？即使我在内部循环的末尾使用bne而不是b，它仍然需要每次进行比较。我有点明白你的意思，我可以以某种方式组合这两个分支，但我不知道如何做，因为它们必须分支到不同的位置？