hi!
I try to convert a normal vector from a DWORD to float3, using the following code:
__m128i n_i;
n_i.m128i_i32[0] = n&0xff;
n_i.m128i_i32[1] = (n >> 8)&0xff;
n_i.m128i_i32[2] = (n >> 16)&0xff;
n_i.m128i_i32[3] = 0;
__m128 n_f = _mm_cvtepi32_ps(n_i);
...
here are the assembly:
...
mov dword ptr [esp], ecx
mov dword ptr [esp+0x4], edx
mov dword ptr [esp+0x8], eax
mov dword ptr [esp+0xc], 0
movdqa xmm0, xmmword ptr [esp]
cvtdq2ps xmm0, xmm0
...
And "cvtdq2ps xmm0, xmm0" has a high CPI rate(1.65).
According to https://fgiesen.wordpress.com/2013/03/04/speculatively-speaking/ , CPU can not forward multiple store to one big load. I wonder whether this is a load hit store or not.