Assembly 返回数组中最小整数的汇编代码将随机返回最后一个或倒数第二个数字
我试图在nasm中创建一个函数,给定一个整数数组和数组长度,它返回最小的整数。这是基于代码战问题。我在64位BlackArch Linux上做这个。我的函数如下所示:Assembly 返回数组中最小整数的汇编代码将随机返回最后一个或倒数第二个数字,assembly,x86,x86-64,nasm,Assembly,X86,X86 64,Nasm,我试图在nasm中创建一个函数,给定一个整数数组和数组长度,它返回最小的整数。这是基于代码战问题。我在64位BlackArch Linux上做这个。我的函数如下所示: SECTION .text global find_smallest_int find_smallest_int: ; [rdi] is the first value in the array. ; We'll store the smallest value so far found ; in rax. The
SECTION .text
global find_smallest_int
find_smallest_int:
; [rdi] is the first value in the array.
; We'll store the smallest value so far found
; in rax. The first value in the array is the
; smallest so far found, therefore we store it
; in rax.
mov rax, [rdi]
; rsi is the second argument to int find_smallest_int(int *, int)
; which represents the length of the array.
; Store it in rbx to be explicit.
mov rbx, rsi
loop:
; Check to see if we've reached the end of the array.
; If we have, we jump to the end of the function and
; return the smallest value (which should be whatever
; is in rax at the moment.
cmp rbx, 0
je end
; Subtract one from our counter. This started as
; the number of elements in the array - when it
; gets to 0, we'll have looped through the entire thing.
sub rbx, 1
; If rax is smaller than [rdi], we'll jump down to the
; rest of the loop. Only if rax is bigger than [rdi] will
; we reassign rax to be the new smallest-yet vaue.
cmp rax, [rdi]
jl postassign
assign:
; If we execute this code, it means rax was not less
; than [rdi]. Therefore, we can safely reassign
; rax to [rdi].
mov rax, [rdi]
postassign:
; Set rdi to point to the next value in the array
add rdi, 4
; if we get here, then we aren't finishing looping yet
; because rbx (the counter) hasn't eached 0 yet.
jmp loop
end:
ret
然后,我通过以下C代码调用此函数:
extern int find_smallest_int(int *array, int size);
int main(void)
{
int nums[4] = {800, 300, 100, 11};
int ret = find_smallest_int(nums, 4);
return ret;
}
最后,我使用以下命令编译并运行整个程序:
#!/bin/bash
# Make an object file from my assembly code with nasm
nasm -f elf64 -o sum.o call_sum.s
# make an object file from my C code
gcc -O0 -m64 -c -o call_sum.o call_sum.c -g
# compile my two object files into an executable
gcc -O0 -m64 -o run sum.o call_sum.o -g
# Run the executable and get the output in the
# form of the exit code.
./run
echo $?
不是得到最小的整数,而是得到100或11(分别是传递给汇编函数的整数数组倒数第二个和最后一个成员)。我得到的结果似乎完全是随机的。我可以运行程序几次,得到11,然后再运行几次,然后开始得到100
如果有人能帮助我理解这种奇怪的行为,我将不胜感激。谢谢
更新:我实现了Jester注释中的更改(使用32位寄存器来保存int),它可以工作,但我不太明白为什么。这个答案的开头是基于Jester的评论。它只是在此基础上进行了扩展,并更详细地解释了这些变化。我也做了一些额外的修改,其中两个也在解决你的资料中的错误 首先,这部分:
一个
int
是4个字节,但在代码中使用了8个字节。使用eax
而不是rax
在您的示例中,这些说明分别从阵列访问8个字节:
mov rax, [rdi]
cmp rax, [rdi]
mov rax, [rdi]
这是因为rax
是一个64位寄存器,因此对内存操作数进行完整的rax
加载或比较会访问8个字节的内存。在NASM语法中,允许显式指定内存操作数的大小,例如通过编写以下命令:
mov rax, qword [rdi]
如果您这样做了,您可能会在前面发现您正在以8字节单位(四字)访问内存。当使用rax
作为目标寄存器时,尝试显式访问双字将失败。以下行在汇编时导致错误“操作数大小不匹配”:
mov rax, dword [rdi]
以下两行很好,都从双字内存操作数加载到rax
。第一个使用零扩展(写入32位寄存器部分时,AMD64指令集中隐含零扩展),第二个使用(显式)符号扩展:
mov eax, dword [rdi]
movsx rax, dword [rdi]
(从dword内存操作数到rax
的movzx
指令不存在,因为它与mov
到eax是冗余的。)
在后面的示例中,您将使用rdi
作为4字节宽类型的地址,通过向其中添加4来推进数组条目指针:
add rdi, 4
这对于int
类型是正确的,但与使用四字作为内存操作数的大小相冲突
Jester的评论还提出了两个问题:
也不要使用rbx
,因为这是被调用方保存的寄存器,从rsi
复制是毫无意义的。与前面一样,最好使用esi
,因为这是另一个int
rsi
问题在于64位rsi
的高32位可能根据ABI持有非零值。如果不确定是否允许使用非零值,则应假定允许,并且应仅在esi
中使用32位值
rbx
(或ebx
)的问题是需要在Linux使用的AMD64 psABI的函数调用之间保留rbx
,请参阅该ABI的文档。在您的简单测试程序中,更改rbx可能不会导致任何故障,但在非平凡的上下文中很容易发生
我发现的下一个问题是您对eax
的初始化。你是这样写的:
; [rdi] is the first value in the array.
; We'll store the smallest value so far found
; in rax. The first value in the array is the
; smallest so far found, therefore we store it
; in rax.
mov rax, [rdi]
但是,正如您的循环流控制逻辑所证明的,您允许调用者为size参数传入零。在这种情况下,您根本不应该访问数组,因为“数组中的第一个值”可能根本不存在,或者根本不应该初始化为任何值。从逻辑上讲,您应该使用INT_MAX初始化最小的值,而不是第一个数组条目
还有一个问题:您正在使用rsi
或esi
作为无符号数,倒数到零。但是,在函数声明中,您将size
参数的类型指定为int
,该参数是有符号的。我通过将声明更改为unsigned int
修复了这个问题
我对你的程序做了一些可选的更改。我使用NASM本地标签作为函数的“子”标签,这很有用,因为您可以在同一源文件中的其他函数中重复使用,例如.loop
或.end
,如果需要添加的话
我还更正了其中一条注释,以注意到我们跳转的原因是eax小于数组项,而不是eax大于或等于数组项。您可以将此条件跳转改为jle
,这样也可以跳转进行相等的比较。可以说,出于清晰度或性能的考虑,一种或另一种可能更受欢迎,但对于哪一种,我没有太多的答案
我还使用了dec-esi
而不是sub-esi,1
,它不是很好,但对我来说更合适。在32位模式下,dec esi
是单字节指令。但在64位模式下并非如此dec esi
与子esi相比为2个字节,1
为3个字节
此外,我将esi
为零的初始检查从使用cmp
更改为test
,这稍微好一点,请参阅
最后,我将实际循环条件更改为位于循环体的末尾,这意味着循环使用的跳转指令更少。无条件跳转到循环体的起点将替换为
INT_MAX equ 7FFF_FFFFh
SECTION .text
global find_smallest_int
find_smallest_int:
; If the array is empty (size = 0) then we want to return
; without reading from the array at all. The value to return
; then logically should be the highest possible number for a
; 32-bit signed integer. This is called INT_MAX in the C
; header limits.h and for 32-bit int is equal to 7FFF_FFFFh.
;
; If the array is not empty, the first iteration will then
; always leave our result register equal to the value in
; the first array entry. This is either equal to INT_MAX
; again, or less than that.
mov eax, INT_MAX
; esi is the second argument to our function, which is
; declared as int find_smallest_int(int *, unsigned int).
; It represents the length of the array. We use this
; as a counter. rsi (and its part esi) need not be preserved
; across function calls for the AMD64 psABI that is used by
; Linux, see https://stackoverflow.com/a/40348010/738287
; Check for an initial zero value in esi. If this is found,
; skip the loop without any iteration (while x do y) and
; return eax as initialised to INT_MAX at the start.
test esi, esi
jz .end
.loop:
; If eax is smaller than dword [rdi], we'll jump down to the
; rest of the loop. Only if eax is bigger than or equal to
; the dword [rdi] will we reassign eax to that, to hold the
; new smallest-yet value.
cmp eax, dword [rdi]
jl .postassign
.assign:
; If we execute this code, it means eax was not less
; than dword [rdi]. Therefore, we can safely reassign
; eax to dword [rdi].
mov eax, dword [rdi]
.postassign:
; Set rdi to point to the next value in the array.
add rdi, 4
; Subtract one from our counter. This started as
; the number of elements in the array - when it
; gets to 0, we'll have looped through the entire thing.
dec esi
; Check to see if we've reached the end of the array.
; To do this, we use the Zero Flag as set by the prior
; dec instruction. If esi has reached zero yet (ZR) then
; we do not continue looping. In that case, we return the
; smallest value found yet (which is in eax at the moment).
;
; Else, we jump to the start of the loop to begin the next
; iteration.
jnz .loop
.end:
retn
.loop:
; If eax is greater than or equal to dword [rdi], we'll
; reassign eax to that dword, the new smallest-yet value.
cmp eax, dword [rdi]
cmovge eax, dword [rdi]
; Set rdi to point to the next value in the array.
add rdi, 4