Function 通过将所有寄存器名从eXX更改为rXX，从32位移植到64位，阶乘返回0？_Function_Assembly_X86 64_32bit 64bit_X86

Function 通过将所有寄存器名从eXX更改为rXX，从32位移植到64位，阶乘返回0？

function assembly x86

Function 通过将所有寄存器名从eXX更改为rXX，从32位移植到64位，阶乘返回0？,function,assembly,x86-64,32bit-64bit,x86,Function,Assembly,X86 64,32bit 64bit,X86,对于所有学习计算机编程艺术的人来说，能够接触到诸如Stack Overflow这样的社区是多么幸运啊！我已经决定承担起学习如何编程计算机的任务，我是通过一本叫做“从头开始编程”的电子书来完成这项任务的，这本书教读者如何在GNU/Linux环境下用汇编语言创建程序我在这本书中的进展已经达到了创建一个程序的地步，该程序使用一个函数计算整数4的阶乘，我所做的和所做的都没有因GCC的汇编程序或运行该程序而导致的任何错误。但是，我的程序中的函数没有返回正确的答案！4的阶乘是24，但程序返回的值是0！正确

对于所有学习计算机编程艺术的人来说，能够接触到诸如Stack Overflow这样的社区是多么幸运啊！我已经决定承担起学习如何编程计算机的任务，我是通过一本叫做“从头开始编程”的电子书来完成这项任务的，这本书教读者如何在GNU/Linux环境下用汇编语言创建程序

我在这本书中的进展已经达到了创建一个程序的地步，该程序使用一个函数计算整数4的阶乘，我所做的和所做的都没有因GCC的汇编程序或运行该程序而导致的任何错误。但是，我的程序中的函数没有返回正确的答案！4的阶乘是24，但程序返回的值是0！正确地说，我不知道这是为什么

以下代码供您参考：

.section .data

.section .text

.globl _start

.globl factorial

_start:

push $4                    #this is the function argument
call factorial             #the function is called
add $4, %rsp               #the stack is restored to its original 
                           #state before the function was called
mov %rax, %rbx             #this instruction will move the result 
                           #computed by the function into the rbx 
                           #register and will serve as the return 
                           #value 
mov $1, %rax               #1 must be placed inside this register for 
                           #the exit system call
int $0x80                  #exit interrupt

.type factorial, @function #defines the code below as being a function

factorial:                 #function label
push %rbp                  #saves the base-pointer
mov %rsp, %rbp             #moves the stack-pointer into the base-
                           #pointer register so that data in the stack 
                           #can be referenced as indexes of the base-
                           #pointer
mov $1, %rax               #the rax register will contain the product 
                           #of the factorial
mov 8(%rbp), %rcx          #moves the function argument into %rcx
start_loop:                #the process loop begins
cmp $1, %rcx               #this is the exit condition for the loop
je loop_exit               #if the value in %rcx reaches 1, exit loop
imul %rcx, %rax            #multiply the current integer of the 
                           #factorial by the value stored in %rax
dec %rcx                   #reduce the factorial integer by 1
jmp start_loop             #unconditional jump to the start of loop
loop_exit:                 #the loop exit begins
mov %rbp, %rsp             #restore the stack-pointer
pop %rbp                   #remove the saved base-pointer from stack
ret                        #return

TL:DR:返回地址的阶乘溢出了
%rax
，留下0，，因为您的移植错误

将32位代码移植到64位并不像更改所有寄存器名那样简单。这可能会使它进行汇编，但正如您发现的，即使是这个简单的程序，其行为也不同。在x86-64中，

push%reg

和

call

都推送64位值，并将

rsp

修改8。如果使用调试器单步执行代码，就会看到这一点。（有关使用asm的

gdb

的信息，请参见的底部。）

您正在阅读一本使用32位示例的书，因此您可能应该只是，而不是在知道如何将它们移植到64位之前尝试

您的
sys\u exit（）
使用32位
int 0x80
ABI仍然可以工作（），但是如果您试图传递64位指针，系统调用会遇到问题
如果要调用任何库函数，也会遇到问题，因为标准函数调用约定也不同。请参阅，以及标记wiki中的64位ABI链接和其他调用约定文档

但是您没有做这些，所以您的程序的问题就归结为没有考虑x86-64中加倍的“堆栈宽度”您的
阶乘
函数读取返回地址作为其参数
这是你的代码，解释了它的实际功能

push $4 # rsp-=8. (rsp) = qword 4 # non-standard calling convention with args on the stack. call factorial # rsp-=8. (rsp) = return address. RIP=factorial add $4, %rsp # misalign the stack, so it's pointing to the top half of the 4 you pushed earlier. # if this was in a function that wanted to return, you'd be screwed. mov %rax, %rbx # copy return value to first arg of system call mov $1, %rax #eax = __NR_EXIT from asm/unistd_32.h, wasting 2 bytes vs. mov $1, %eax int $0x80 # 32-bit ABI system call, eax=call number, ebx=first arg. sys_exit(factorial(4))
因此，调用方是可以接受的（对于您发明的非标准64位调用约定，它在堆栈上传递所有参数）。您还可以完全省略
添加到%rsp ，因为您将要退出而不再接触堆栈 .type factorial, @function #defines the code below as being a function factorial: #function label push %rbp #rsp-=8, (rsp) = rbp mov %rsp, %rbp # make a traditional stack frame mov $1, %rax #retval = 1. (Wasting 2 bytes vs. the exactly equivalent mov $1, %eax) mov 8(%rbp), %rcx #load the return address into %rcx ... and calculate the factorial 对于静态可执行文件（和动态链接的可执行文件），\u start 通常位于0x4000c0 。您的程序仍将在现代CPU上几乎瞬间运行，因为0x4000c0 *3cimul 的延迟仍然只有1250万个核心时钟周期。在4GHz的CPU上，这是3毫秒的CPU时间如果您通过在最近的发行版上链接gcc foo.o 来创建一个位置独立的可执行文件，\u start 将有一个类似0x555545a0 的地址，并且您的函数将在4GHz CPU上运行约70368秒，具有3个周期的imul延迟 4194496！包含许多偶数，因此其二进制表示形式具有许多尾随零。当您将0x4000c0 中的每个数字乘以1时，整个%rax 将为零 Linux进程的退出状态仅为传递给sys_exit（）的整数的低8位（因为wstatus 仅为32位int，并包含其他内容，如进程结束时的信号。请参阅）。因此，即使使用小参数，也不会花费太多时间。 TL:DR：返回地址的阶乘溢出了%rax ，留下0，，因为您的移植错误将32位代码移植到64位并不像更改所有寄存器名那样简单。这可能会使它进行汇编，但正如您发现的，即使是这个简单的程序，其行为也不同。在x86-64中，push%reg 和call 都推送64位值，并将rsp 修改8。如果使用调试器单步执行代码，就会看到这一点。（有关使用asm的gdb 的信息，请参见的底部。）您正在阅读一本使用32位示例的书，因此您可能应该只是，而不是在知道如何将它们移植到64位之前尝试您的sys\u exit（）使用32位int 0x80 ABI仍然可以工作（），但是如果您试图传递64位指针，系统调用会遇到问题如果要调用任何库函数，也会遇到问题，因为标准函数调用约定也不同。请参阅，以及标记wiki中的64位ABI链接和其他调用约定文档但是您没有做这些，所以您的程序的问题就归结为没有考虑x86-64中加倍的“堆栈宽度”您的阶乘函数读取返回地址作为其参数这是你的代码，解释了它的实际功能 push $4 # rsp-=8. (rsp) = qword 4 # non-standard calling convention with args on the stack. call factorial # rsp-=8. (rsp) = return address. RIP=factorial add $4, %rsp # misalign the stack, so it's pointing to the top half of the 4 you pushed earlier. # if this was in a function that wanted to return, you'd be screwed. mov %rax, %rbx # copy return value to first arg of system call mov $1, %rax #eax = __NR_EXIT from asm/unistd_32.h, wasting 2 bytes vs. mov $1, %eax int $0x80 # 32-bit ABI system call, eax=call number, ebx=first arg. sys_exit(factorial(4)) 因此，调用方是可以接受的（对于您发明的非标准64位调用约定，它在堆栈上传递所有参数）。您还可以完全省略添加到%rsp ，因为您将要退出而不再接触堆栈 .type factorial, @function #defines the code below as being a function factorial: #function label push %rbp #rsp-=8, (rsp) = rbp mov %rsp, %rbp # make a traditional stack frame mov $1, %rax #retval = 1. (Wasting 2 bytes vs. the exactly equivalent mov $1, %eax) mov 8(%rbp), %rcx #load the return address into %rcx ... and calculate the factorial 对于静态可执行文件（和动态链接的可执行文件），\u start 通常位于0x4000c0 。您的程序仍将在现代CPU上几乎瞬间运行，因为0x4000c0 *3cimul 的延迟仍然只有12.5m