Assembly x87 FPU上基于牛顿-拉斐逊方法的立方根_Assembly_X86_Masm_Newtons Method_X87

Assembly x87 FPU上基于牛顿-拉斐逊方法的立方根

assembly x86

Assembly x87 FPU上基于牛顿-拉斐逊方法的立方根,assembly,x86,masm,newtons-method,x87,Assembly,X86,Masm,Newtons Method,X87,我正在尝试使用8086处理器编写一个汇编程序，该程序将查找数字的立方根。显然，我使用的是浮点算法基于：我想我不明白堆栈中的什么在什么位置。这个产品适合生产线吗金缕梅根；根*根进入圣（1）街？编辑否，它进入st（0）st（0）中的内容被下推到st（1）堆栈中但是我还没有找到我问题的答案如何除法？现在我明白了我需要将st（1）除以st（0），但我不知道如何除法。我尝试过这个 finit ; initialize FPU fld root ; root in

我正在尝试使用8086处理器编写一个汇编程序，该程序将查找数字的立方根。显然，我使用的是浮点

算法基于：

我想我不明白堆栈中的什么在什么位置。这个产品适合生产线吗

金缕梅根；根*根

进入圣（1）街？编辑否，它进入st（0）st（0）中的内容被下推到st（1）堆栈中

但是我还没有找到我问题的答案如何除法？现在我明白了我需要将st（1）除以st（0），但我不知道如何除法。我尝试过这个

finit           ; initialize FPU
fld     root    ; root in ST
fmul    two     ; root*two
fadd    xx      ; root*two+27
; the answer to root*two+x is stored in ST(0) when we load root st(0) moves to ST1 and we will use ST0 for the next operation

fld     root    ; root in ST previous content is now in ST1
fimul   root    ; root*root
fidiv   st(1)

编辑：我把公式写错了。这就是我要找的

(2.0*root) + x / (root*root)) / 3.0 That's what I need. 
STEP 1) (2 * root) 
STEP 2) x / (root * root) 
STEP 3) ADD step one and step 2 
STEP 4) divide step 3 by 3.0

根=（2.0*1.0）+27/（1.0*1.0）/3.0；（2） +27/（1.0）/3.0=11=>root=11

编辑2：新代码

.586
.MODEL FLAT
.STACK 4096

.DATA
root    REAL4   1.0
oldRoot REAL4   2.0
Two     REAL4   2.0
three   REAL4   3.0
xx      REAL4   27.0


.CODE
main    PROC
        finit           ; initialize FPU
                fld     root    ; root in ST    ; Pass 1 ST(0) has 1.0  
repreatAgain:
        ;fld    st(2)

        fmul    two     ; root*two      ; Pass 1 ST(0) has 2                                                                            Pass 2 ST(0) = 19.333333 st(1) = 3.0 st(2) = 29.0 st(3) = 1.0

        ; the answer to roor*two is stored in ST0 when we load rootSTO moves to ST1 and we will use ST0 for the next operation
        fld     root    ; root in ST(0) previous content is now in ST(1)      Pass 1 ST(0) has 1.0 ST(1) has 2.0                        Pass 2 st(
        fmul    st(0), st(0)    ; root*root                                 ; Pass 1 st(0) has 1.0 st(1) has 2.0
        fld     xx                                                          ; Pass 1 st(0) has 27.0 st(1) has 1.0 st(2) has 2.0
        fdiv    st(0), st(1) ; x / (root*root)  ; Pass 1: 27 / 1              Pass 1 st(0) has 27.0 st(1) has 2.0 st(2) has 2.0
        fadd    st(0), st(2) ; (2.0*root) + x / (root*root))                  Pass 1 st(0) has 29.0 st(1) has 1.0 st(2) has 2.0

        fld     three                                                       ; Pass 1 st(0) has 3.0 st(1) has 29.0 st(2) has 1.0 st(3) has 2.0

        fld     st(1)                                                       ; Pass 1 st(0) has 3.0 st(1) has 29.0 st(2) = 1.0 st(3) = 2.0
        fdiv    st(0), st(1) ; (2.0*root) + x / (root*root)) / 3.0            Pass 1 st(1) has 9.6666666666667



        jmp     repreatAgain
        mov     eax, 0  ; exit
        ret
main    ENDP 
END

英特尔的insn参考手册记录了所有指令，包括和

fdivr

（x/y而不是y/x）。如果您确实需要学习大部分过时的x87（

fdiv

）而不是SSE2（

divss

），那么，特别是解释寄存器堆栈的早期章节。另见。查看标记wiki中的更多链接

回复：EDIT2代码转储：

循环中有4条

fld

指令，但没有

后缀操作。您的循环将在第三次迭代时溢出8寄存器FP堆栈，此时您将得到一个NaN。（特别是不定值NaN，printf打印为

1#IND

）

我建议设计您的循环，使迭代从
st（0）
中的
root
开始，并以
st（0）中下一次迭代的root 值结束
。不要在循环内部加载或存储到根目录下的内容。使用

fld1

在循环外部加载1.0作为初始值，在循环之后使用

fstp[root]

将

st（0）

弹出到内存中

您选择了执行tmp/3.0最不方便的方法

                          ; stack = tmp   (and should otherwise be empty once you fix the rest of your code)
    fld     three         ; stack = 3.0, tmp
    fld     st(1)         ; stack = tmp, 3.0, tmp   ; should have used fxchg to just swap instead of making the stack deeper
    fdiv    st(0), st(1)  ; stack = tmp/3.0, 3.0, tmp

fdiv

，

fsub

等。有多个注册表表单：一个是目标，另一个是源。源为

st（0）

的表单也有

op，因此您可以

    fld     three         ; stack = 3.0, tmp
    fdivp                 ; stack = tmp / 3.0  popping the stack back to just one entry
    ; fdivp  st(1), st(0) ; this is what fdivp with no operands means

如果直接使用内存操作数而不是加载它，实际上比这更简单。因为您想要

st（0）/=3.0

，您可以执行
fdiv[three]
。在这种情况下，FP操作就像整数操作，您可以执行

div dword ptr[integer\u from\u memory]

来使用内存源操作数

非交换操作（减法和除法）也有反向版本（例如，

fdivr

），可以为您保存

fxchg

，或者允许您使用内存操作数，即使您需要3.0/tmp而不是tmp/3.0

除以3等于乘以1/3，

fmul

比

fdiv

快得多。从代码简单性的角度来看，乘法是可交换的，因此实现

st（0）/=3的另一种方法是：
fld    [one_third]
fmulp                  ; shorthand for  fmulp st(1), st(0)

; or
fmul   [one_third]

请注意，1/3.0
没有二进制浮点的精确表示，但是+/-2^23之间的所有整数都可以（单精度REAL4的尾数大小）。如果您希望使用三的精确倍数，您应该只关心这一点

对原代码的评论：
您可以通过提前执行2.0/3.0
和x/3.0
将分区从循环中提升出来。如果您希望循环平均运行多个迭代，这是值得的

您可以使用fld st（0）
复制堆栈顶部，因此不必一直从内存加载

fimul[root]
（integermul）是一个bug：您的root
是REAL4
（32位浮点）格式，而不是整数。fidiv
也是一个bug，当然不能将x87寄存器用作源操作数
由于堆栈顶部有根
，因此我认为您可以使用fmul st（0）
将st（0）
用作显式和隐式操作数，从而产生st（0）=st（0）*st（0）
，而堆栈深度不变

您还可以使用sqrt作为比1.0更好的初始近似值，或者使用+/-1*sqrtf（fabsf（x））
。我没有看到x87指令将一个浮点数的符号应用到另一个浮点数，只需fchs
无条件翻转，以及fabs
无条件清除符号位。有一个fcmov
，但它需要P6或更高版本的CPU。您提到了8086，但随后使用了.586
，所以请确定您的目标

更好的环体：
没有调试或测试，但是你的代码充满了来自相同数据的重复加载，这让我发疯。这个优化版本在这里是因为我好奇，不是因为我认为它会直接帮助OP
另外，希望这是一个很好的示例，说明了如何在代码中对数据流进行注释（例如，x87或带有无序排列的矢量化代码）
32位函数调用约定返回st（0）中的FP结果，因此您可以这样做，但调用者可能必须存储在某个位置。
对于那些刚接触x87的人，我将从一个非常基本的层面回答这个问题，他们可能需要在FPU上进行计算
有两件事要考虑。如果你得到一个计算（）：
有没有办法将其转换为x87 FPU可以使用的基本指令？是的，在一个非常基本的级别上，x87 FPU是一个类似于复杂计算器的堆栈。代码中的等式是中缀表示法。如果将其转换为后缀（RPN）表示法，它可以很容易地实现为一个包含操作的堆栈
这提供了一些关于转换为后缀符号的信息
    fld     three         ; stack = 3.0, tmp
    fdivp                 ; stack = tmp / 3.0  popping the stack back to just one entry
    ; fdivp  st(1), st(0) ; this is what fdivp with no operands means

fld    [one_third]
fmulp                  ; shorthand for  fmulp st(1), st(0)

; or
fmul   [one_third]

## x/3.0 in st(1)
## 2.0/3.0 in st(2)

# before each iteration: st(0) = root
#  after each iteration: st(0) = root * 2.0/3.0 + (x/3.0 / (root*root)), with constants undisturbed

loop_body:
    fld     st(0)         ; stack: root, root, 2/3, x/3
    fmul    st(0), st(0)  ; stack: root^2, root, 2/3, x/3
    fdivr   st(0), st(3)  ; stack: x/3 / root^2, root, 2/3, x/3
    fxchg   st(1)         ; stack: root, x/3/root^2, 2/3, x/3
    fmul    st(0), st(2)  ; stack: root*2/3, x/3/root^2, 2/3, x/3
    faddp                 ; stack: root*2/3 + x/3/root^2, 2/3, x/3

; TODO: compare and loop back to loop_body

    fstp    [root]         ; store and pop
    fstp    st(0)          ; pop the two constants off the FP stack to empty it before returning
    fstp    st(0)
    ; finit is very slow, ~80cycles, don't use it if you don't have to.

root := (2.0*root + x/(root*root)) / 3.0

2.0 root * x root root * / + 3.0 /

2.0 [enter] root * x [enter] root [enter] root * / + 3.0 /

fld Two      ; st(0)=2.0
fld root     ; st(0)=root, st(1)=2.0
fmulp        ; st(0)=(2.0 * root)
fld xx       ; st(0)=x, st(1)=(2.0 * root)
fld root     ; st(0)=root, st(1)=x, st(2)=(2.0 * root)
fld root     ; st(0)=root, st(1)=root, st(2)=x, st(3)=(2.0 * root)
fmulp        ; st(0)=(root * root), st(1)=x, st(2)=(2.0 * root)
fdivp        ; st(0)=(x / (root * root)), st(1)=(2.0 * root)
faddp        ; st(0)=(2.0 * root) + (x / (root * root))
fld Three    ; st(0)=3.0, st(1)=(2.0 * root) + (x / (root * root))
fdivp        ; st(0)=((2.0 * root) + (x / (root * root))) / 3.0

    fld     root            ; st(0)=root  
repreatAgain:
    fmul    two             ; st(0)=(2.0*root)      
    fld     root            ; st(0)=root, st(1)=(2.0*root) 
    fmul    st(0), st(0)    ; st(0)=(root*root), st(1)=(2.0*root)
    fld     xx              ; st(0)=x, st(1)=(root*root), st(2)=(2.0*root)
    fdiv    st(0), st(1)    ; st(0)=(x/(root*root)), st(1)=(root*root), st(2)=(2.0*root)
    fadd    st(0), st(2)    ; st(0)=((2.0*root) + x/(root*root)), st(1)=(root*root), st(2)=(2.0*root)
    fld     three           ; st(0)=3.0, st(1)=((2.0*root) + x/(root*root)), st(2)=(root*root), st(3)=(2.0*root)                                            
    fld     st(1)           ; st(0)=((2.0*root) + x/(root*root)), st(1)=3.0, st(2)=((2.0*root) + x/(root*root)), st(3)=(root*root), st(4)=(2.0*root)
    fdiv    st(0), st(1)    ; st(0)=(((2.0*root) + x/(root*root))/3.0), st(1)=3.0, st(2)=((2.0*root) + x/(root*root)), st(3)=(root*root), st(4)=(2.0*root)
    jmp     repreatAgain

ffree   st(4)           ; st(0)=(((2.0*root) + x/(root*root))/3.0), st(1)=3.0, st(2)=((2.0*root) + x/(root*root)), st(3)=(root*root)
ffree   st(3)           ; st(0)=(((2.0*root) + x/(root*root))/3.0), st(1)=3.0, st(2)=((2.0*root) + x/(root*root))
ffree   st(2)           ; st(0)=(((2.0*root) + x/(root*root))/3.0), st(1)=3.0
ffree   st(1)           ; st(0)=(((2.0*root) + x/(root*root))/3.0)
fst     root            ; Update root variable
jmp     repreatAgain

    finit        ; initialize FPU
repreatAgain:
    fld Two      ; st(0)=2.0
    fld root     ; st(0)=root, st(1)=2.0
    fmulp        ; st(0)=(2.0 * root)
    fld xx       ; st(0)=x, st(1)=(2.0 * root)
    fld root     ; st(0)=root, st(1)=x, st(2)=(2.0 * root)
    fld root     ; st(0)=root, st(1)=root, st(2)=x, st(3)=(2.0 * root)
    fmulp        ; st(0)=(root * root), st(1)=x, st(2)=(2.0 * root)
    fdivp        ; st(0)=(x / (root * root)), st(1)=(2.0 * root)
    faddp        ; st(0)=(2.0 * root) + (x / (root * root))
    fld Three    ; st(0)=3.0, st(1)=(2.0 * root) + (x / (root * root))
    fdivp        ; newroot = st(0)=((2.0 * root) + (x / (root * root))) / 3.0
    fstp root    ; Store result at top of stack into root and pop value
                 ;     at this point the stack is clear again since
                 ;     all items pushed have been popped.

    jmp repreatAgain

epsilon REAL4   0.001

.CODE
main PROC
    finit              ; initialize FPU

repeatAgain:
    fld Two            ; st(0)=2.0
    fld root           ; st(0)=root, st(1)=2.0
    fmulp              ; st(0)=(2.0 * root)
    fld xx             ; st(0)=x, st(1)=(2.0 * root)
    fld root           ; st(0)=root, st(1)=x, st(2)=(2.0 * root)
    fld root           ; st(0)=root, st(1)=root, st(2)=x, st(3)=(2.0 * root)
    fmulp              ; st(0)=(root * root), st(1)=x, st(2)=(2.0 * root)
    fdivp              ; st(0)=(x / (root * root)), st(1)=(2.0 * root)
    faddp              ; st(0)=(2.0 * root) + (x / (root * root))
    fld Three          ; st(0)=3.0, st(1)=(2.0 * root) + (x / (root * root))
    fdivp              ; newroot=st(0)=((2.0 * root) + (x / (root * root))) / 3.0
    fld root           ; st(0)=oldroot, st(1)=newroot
    fsub st(0), st(1)  ; st(0)=(oldroot-newroot), st(1)=newroot
    fabs               ; st(0)=(|oldroot-newroot|), st(1)=newroot
    fld epsilon        ; st(0)=0.001, st(1)=(|oldroot-newroot|), st(2)=newroot
    fcompp             ; Do compare&set x87 flags pop top two values off stack
                       ;     st(0)=newroot    
    fstsw ax           ; Copy x87 Status Word containing the result to AX
    fwait              ; Insure previous instruction completed
    sahf               ; Transfer condition codes to the CPU's flags register

    fstp root          ; Store result (newroot) at top of stack into root 
                       ;     and pop value. At this point the stack is clear
                       ;     again since all items pushed have been popped.
    jbe repeatAgain    ; If 0.001 <= (|oldroot-newroot|) repeat
    mov eax, 0         ; exit
    ret
main    ENDP 
END

fld root           ; st(0)=oldroot, st(1)=newroot
fsub st(0), st(1)  ; st(0)=(oldroot-newroot), st(1)=newroot
fabs               ; st(0)=(|oldroot-newroot|), st(1)=newroot
fld epsilon        ; st(0)=0.001, st(1)=(|oldroot-newroot|), st(2)=newroot
fcomip st(0),st(1) ; Do compare & set CPU flags, pop one value off stack
                   ;     st(0)=(|oldroot-newroot|), st(1)=newroot
fstp st(0)         ; Pop temporary value off top of stack
                   ;     st(0)=newroot

fstp root          ; Store result (newroot) at top of stack into root 
                   ;     and pop value. At this point the stack is clear
                   ;     again since all items pushed have been popped.
jbe repeatAgain    ; If 0.001 <= (|oldroot-newroot|) repeat

twothirds root * xover3 root root * / +

.586
.MODEL FLAT
.STACK 4096

.DATA
xx        REAL4   27.0
root      REAL4   1.0
Three     REAL4   3.0
epsilon   REAL4   0.001
Twothirds REAL4 0.6666666666666666

.CODE
main PROC
    finit               ; Initialize FPU
    fld epsilon         ; st(0)=epsilon
    fld root            ; st(0)=prevroot (Copy of root), st(1)=epsilon
    fld TwoThirds       ; st(0)=(2/3), st(1)=prevroot, st(2)=epsilon 
    fld xx              ; st(0)=x, st(1)=(2/3), st(2)=prevroot, st(3)=epsilon
    fdiv Three          ; st(0)=(x/3), st(1)=(2/3), st(2)=prevroot, st(3)=epsilon
    fld st(2)           ; st(0)=root, st(1)=(x/3), st(2)=(2/3), st(3)=prevroot, st(4)=epsilon

repeatAgain:

    ; twothirds root * xover3 root root * / +
    fld st(0)           ; st(0)=root, st(1)=root, st(2)=(x/3), st(3)=(2/3), st(4)=prevroot, st(5)=epsilon
    fmul st(0), st(3)   ; st(0)=(2/3*root), st(1)=root, st(2)=(x/3), st(3)=(2/3), st(4)=prevroot, st(5)=epsilon           
    fxch                ; st(0)=root, st(1)=(2/3*root), st(2)=(x/3), st(3)=(2/3), st(4)=prevroot, st(5)=epsilon
    fmul st(0), st(0)   ; st(0)=(root*root), st(1)=(2/3*root), st(2)=(x/3), st(3)=(2/3), st(4)=prevroot, st(5)=epsilon
    fdivr st(0), st(2)  ; st(0)=((x/3)/(root*root)), st(1)=(2/3*root), st(2)=(x/3), st(3)=(2/3), st(4)=prevroot, st(5)=epsilon
    faddp               ; st(0)=((2/3*root)+(x/3)/(root*root)), st(1)=(x/3), st(2)=(2/3), st(3)=prevroot, st(4)=epsilon
    fxch st(3)          ; st(0)=prevroot, st(1)=(x/3), st(2)=(2/3), newroot=st(3)=((2/3*root)+(x/3)/(root*root)), st(4)=epsilon 
    fsub st(0), st(3)   ; st(0)=(prevroot-newroot), st(1)=(x/3), st(2)=(2/3), st(3)=newroot, st(4)=epsilon
    fabs                ; st(0)=(|prevroot-newroot|), st(1)=(x/3), st(2)=(2/3), st(3)=newroot, st(4)=epsilon
    fld st(4)           ; st(0)=0.001, st(1)=(|prevroot-newroot|), st(2)=(x/3), st(3)=(2/3), st(4)=newroot, st(5)=epsilon

    fcompp              ; Do compare&set x87 flags pop top two values off stack
                        ;     st(0)=(x/3), st(1)=(2/3), st(2)=newroot, st(3)=epsilon    
    fstsw ax            ; Copy x87 Status Word containing the result to AX
    fwait               ; Insure previous instruction completed
    sahf                ; Transfer condition codes to the CPU's flags register

    fld st(2)           ; st(0)=newroot, st(1)=(x/3), st(2)=(2/3), st(3)=newroot, st(4)=epsilon
    jbe repeatAgain     ; If 0.001 <= (|oldroot-newroot|) repeat

    ; Remove temporary values on stack, cubed root in st(0)
    ffree st(4)         ; st(0)=newroot, st(1)=(x/3), st(2)=(2/3), st(3)=epsilon
    ffree st(3)         ; st(0)=newroot, st(1)=(x/3), st(2)=(2/3)
    ffree st(2)         ; st(0)=newroot, st(1)=(x/3)
    ffree st(1)         ; st(0)=newroot

    mov     eax, 0  ; exit
    ret
main ENDP 

END