C++ ASM-使用扩展指令执行(int&x2B;int)*浮点常量的最佳方式是什么?

C++ ASM-使用扩展指令执行(int&x2B;int)*浮点常量的最佳方式是什么?,c++,winapi,assembly,sse,mmx,C++,Winapi,Assembly,Sse,Mmx,我正在做一个函数来响应WM_MOUSEMOVE在opengl应用程序中移动我的相机 该函数用于获取起始点(wm_lbuttondown命令中的旧LPRAM)并从起始点减去当前点,然后将结果乘以某个浮点系数 class cam { int sp; //staring point saved here float x_coeff;//the addresses of sp and x_coeff are aligned , I can load them both as a quad-word

我正在做一个函数来响应WM_MOUSEMOVE在opengl应用程序中移动我的相机

该函数用于获取起始点(wm_lbuttondown命令中的旧LPRAM)并从起始点减去当前点,然后将结果乘以某个浮点系数

class cam
{
 int sp; //staring point saved here
 float x_coeff;//the addresses of sp and x_coeff are aligned , I can load them both as a quad-word later
}

case WM_LBUTTONDOWN:
    cam.sp=lParam;
    return 0;
case WM_MOUSEMOVE:
    cam.drag_camera(lParam);
    return 0;

cam::drag_camera(LPARAM lParam)
{
  float step=0.001;
  short old_x=sp&0xFFFF;
  short old_y=sp>>16;
  short current_x=lParam&0xFFFF;
  short curretn_y=lParam>>16;
  x_move=(old_x-current_x)*step;
  .... do something with the step
}
好的,它是有效的,但我正在尝试练习使用asm和所有这些漂亮的寄存器。 这是我的代码,但是使用mmx寄存器

cam::drag_camera(LPARAM lParam)
{
  _asm
  { 
    movd mm0,lParam       //MOVE current mouse LPARAM point to mm0    - mm0 = 00:00:cy:cx
    movq mm1,[ebx+40h]    //MOVE starting mouse point LPARAM to low dword of mm1 and x_coeff in  high dword of mm1   - mm1 = x_coeff:sy:sx
    psubw mm1,mm0         //SUB current - starting mm1 = x_coeff:(sy-cy):(sx-cx)
    punpcklwd mm2,mm1     //PUT packed word result to mm2 double words   m2=00:(sy-cy):00:(sx-cx)
    psrad mm2,16          //Sign extend both double words of the result  m2=(sy-cy):(sx-cx)
    cvtpi2ps xmm7,mm2     //MOVE X Y result to XMM7  xmm7 = 0000:0000:sy-cy:sx-cx
    psrlq mm1,32          //SHIFT the x_coeff from the high dword to the left m1=00:00:x_coeff 
    movq2dq  xmm6,mm1     //SEND coeff to xmm6 low dword    xmm6=0000:0000:0000:x_coeff
    shufps xmm6,xmm6,00h  //SHUFFLE x_coeff everywhere      xmm6=x_coeff:x_coeff:x_coeff:x_coeff
    mulps xmm7,xmm6       //MULTIPLY 0:0:Y:X by x_coeff     xmm7=0000:0000:(sy-cy)*x_coeff:(sx-cx)*x_coeff

  }
}

问题是——这是一种快速完成如此简单任务的方法,还是我可以选择其他方式来完成这些任务?谢谢

当您为C代码和asm代码计时时,它们的比较结果如何?够快吗?如果没有,你需要节省多少时间?事实上,我仍然没有先在我的研究(studyings)配置文件中对我的代码进行计时,然后再进行优化(如果需要的话)。MMX甚至?我已经有一段时间没碰它了。C代码编译成什么?提供了一个很好的打击目标。这与分析无关!我想了解mmx/sse的正常功能