C++ ASM-使用扩展指令执行（int&x2B；int）*浮点常量的最佳方式是什么？_C++_Winapi_Assembly_Sse_Mmx

C++ ASM-使用扩展指令执行（int&x2B；int）*浮点常量的最佳方式是什么？

c++ winapi assembly

C++ ASM-使用扩展指令执行（int&x2B；int）*浮点常量的最佳方式是什么？,c++,winapi,assembly,sse,mmx,C++,Winapi,Assembly,Sse,Mmx,我正在做一个函数来响应WM_MOUSEMOVE在opengl应用程序中移动我的相机该函数用于获取起始点（wm_lbuttondown命令中的旧LPRAM）并从起始点减去当前点，然后将结果乘以某个浮点系数 class cam { int sp; //staring point saved here float x_coeff;//the addresses of sp and x_coeff are aligned , I can load them both as a quad-word

我正在做一个函数来响应WM_MOUSEMOVE在opengl应用程序中移动我的相机

该函数用于获取起始点（wm_lbuttondown命令中的旧LPRAM）并从起始点减去当前点，然后将结果乘以某个浮点系数

class cam
{
 int sp; //staring point saved here
 float x_coeff;//the addresses of sp and x_coeff are aligned , I can load them both as a quad-word later
}

case WM_LBUTTONDOWN:
    cam.sp=lParam;
    return 0;
case WM_MOUSEMOVE:
    cam.drag_camera(lParam);
    return 0;

cam::drag_camera(LPARAM lParam)
{
  float step=0.001;
  short old_x=sp&0xFFFF;
  short old_y=sp>>16;
  short current_x=lParam&0xFFFF;
  short curretn_y=lParam>>16;
  x_move=(old_x-current_x)*step;
  .... do something with the step
}

好的，它是有效的，但我正在尝试练习使用asm和所有这些漂亮的寄存器。这是我的代码，但是使用mmx寄存器

cam::drag_camera(LPARAM lParam)
{
  _asm
  { 
    movd mm0,lParam       //MOVE current mouse LPARAM point to mm0    - mm0 = 00:00:cy:cx
    movq mm1,[ebx+40h]    //MOVE starting mouse point LPARAM to low dword of mm1 and x_coeff in  high dword of mm1   - mm1 = x_coeff:sy:sx
    psubw mm1,mm0         //SUB current - starting mm1 = x_coeff:(sy-cy):(sx-cx)
    punpcklwd mm2,mm1     //PUT packed word result to mm2 double words   m2=00:(sy-cy):00:(sx-cx)
    psrad mm2,16          //Sign extend both double words of the result  m2=(sy-cy):(sx-cx)
    cvtpi2ps xmm7,mm2     //MOVE X Y result to XMM7  xmm7 = 0000:0000:sy-cy:sx-cx
    psrlq mm1,32          //SHIFT the x_coeff from the high dword to the left m1=00:00:x_coeff 
    movq2dq  xmm6,mm1     //SEND coeff to xmm6 low dword    xmm6=0000:0000:0000:x_coeff
    shufps xmm6,xmm6,00h  //SHUFFLE x_coeff everywhere      xmm6=x_coeff:x_coeff:x_coeff:x_coeff
    mulps xmm7,xmm6       //MULTIPLY 0:0:Y:X by x_coeff     xmm7=0000:0000:(sy-cy)*x_coeff:(sx-cx)*x_coeff

  }
}

问题是——这是一种快速完成如此简单任务的方法，还是我可以选择其他方式来完成这些任务？谢谢

当您为C代码和asm代码计时时，它们的比较结果如何？够快吗？如果没有，你需要节省多少时间？事实上，我仍然没有先在我的研究（studyings）配置文件中对我的代码进行计时，然后再进行优化（如果需要的话）。MMX甚至？我已经有一段时间没碰它了。C代码编译成什么？提供了一个很好的打击目标。这与分析无关！我想了解mmx/sse的正常功能