C++ ASM-使用扩展指令执行(int&x2B;int)*浮点常量的最佳方式是什么?
我正在做一个函数来响应WM_MOUSEMOVE在opengl应用程序中移动我的相机 该函数用于获取起始点(wm_lbuttondown命令中的旧LPRAM)并从起始点减去当前点,然后将结果乘以某个浮点系数C++ ASM-使用扩展指令执行(int&x2B;int)*浮点常量的最佳方式是什么?,c++,winapi,assembly,sse,mmx,C++,Winapi,Assembly,Sse,Mmx,我正在做一个函数来响应WM_MOUSEMOVE在opengl应用程序中移动我的相机 该函数用于获取起始点(wm_lbuttondown命令中的旧LPRAM)并从起始点减去当前点,然后将结果乘以某个浮点系数 class cam { int sp; //staring point saved here float x_coeff;//the addresses of sp and x_coeff are aligned , I can load them both as a quad-word
class cam
{
int sp; //staring point saved here
float x_coeff;//the addresses of sp and x_coeff are aligned , I can load them both as a quad-word later
}
case WM_LBUTTONDOWN:
cam.sp=lParam;
return 0;
case WM_MOUSEMOVE:
cam.drag_camera(lParam);
return 0;
cam::drag_camera(LPARAM lParam)
{
float step=0.001;
short old_x=sp&0xFFFF;
short old_y=sp>>16;
short current_x=lParam&0xFFFF;
short curretn_y=lParam>>16;
x_move=(old_x-current_x)*step;
.... do something with the step
}
好的,它是有效的,但我正在尝试练习使用asm和所有这些漂亮的寄存器。
这是我的代码,但是使用mmx寄存器
cam::drag_camera(LPARAM lParam)
{
_asm
{
movd mm0,lParam //MOVE current mouse LPARAM point to mm0 - mm0 = 00:00:cy:cx
movq mm1,[ebx+40h] //MOVE starting mouse point LPARAM to low dword of mm1 and x_coeff in high dword of mm1 - mm1 = x_coeff:sy:sx
psubw mm1,mm0 //SUB current - starting mm1 = x_coeff:(sy-cy):(sx-cx)
punpcklwd mm2,mm1 //PUT packed word result to mm2 double words m2=00:(sy-cy):00:(sx-cx)
psrad mm2,16 //Sign extend both double words of the result m2=(sy-cy):(sx-cx)
cvtpi2ps xmm7,mm2 //MOVE X Y result to XMM7 xmm7 = 0000:0000:sy-cy:sx-cx
psrlq mm1,32 //SHIFT the x_coeff from the high dword to the left m1=00:00:x_coeff
movq2dq xmm6,mm1 //SEND coeff to xmm6 low dword xmm6=0000:0000:0000:x_coeff
shufps xmm6,xmm6,00h //SHUFFLE x_coeff everywhere xmm6=x_coeff:x_coeff:x_coeff:x_coeff
mulps xmm7,xmm6 //MULTIPLY 0:0:Y:X by x_coeff xmm7=0000:0000:(sy-cy)*x_coeff:(sx-cx)*x_coeff
}
}
问题是——这是一种快速完成如此简单任务的方法,还是我可以选择其他方式来完成这些任务?谢谢当您为C代码和asm代码计时时,它们的比较结果如何?够快吗?如果没有,你需要节省多少时间?事实上,我仍然没有先在我的研究(studyings)配置文件中对我的代码进行计时,然后再进行优化(如果需要的话)。MMX甚至?我已经有一段时间没碰它了。C代码编译成什么?提供了一个很好的打击目标。这与分析无关!我想了解mmx/sse的正常功能