This topic is 4349 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I'm trying to learn SSE intrinsices and I've written a dot product function with the following code:
		inline float Dot(const CVector4 &vVector)
{
__m128 vec1;
__m128 vec2;

vec1 = _mm_mul_ps(this->v, vVector.v);
vec2 = _mm_movehl_ps(vec1, vec1);
vec1 = _mm_shuffle_ps(vec1,vec1, _MM_SHUFFLE(1,0,0,0));
return vec1.m128_f32[0];
}

Now when I look a the assembly code generated from the code above, it seems that the compiler (VC++ 2005) generates unneeded movaps instructions:
	mov	eax, DWORD PTR _vVector$[ebp] movaps xmm1, XMMWORD PTR [ecx] movaps xmm0, XMMWORD PTR [eax] mulps xmm0, xmm1 movaps xmm1, xmm0 <--- ??? movhlps xmm1, xmm0 movaps xmm2, xmm0 <--- ??? addss xmm1, xmm0 shufps xmm2, xmm0, 1 addss xmm2, xmm1 movaps XMMWORD PTR _vec1$[esp+16], xmm2
movss	xmm0, DWORD PTR _vec1\$[esp+16]

Can I modify the code in such a way that the compiler wouldn't add these unnecessary instructions or is there no way around it? I don't want to use inline assembly since the compiler usually adds a bunch of code before and after the assembly block which makes the generated code a lot slower. Any suggestions?

##### Share on other sites
If you want to use assembly and have it inlined with no setup you can use:

__declspec(naked)inline float Dot(const CVector4 &vVector){   __asm   {   ...   }}

You lose some of the optimizers ability to analyze the code but the compiler intrinsics aren't generally seen as that optimal. If you're starting to use sse intrinsics you've probably analyzed the performance benefit and current cost enough to warrant hand optimizing that section.

##### Share on other sites
Ah got it, thanks for the tip. I'll try moving the member functions out of the class and use __declspec(naked)

##### Share on other sites
this defeats the intrinsics' feature of being usable for x64 and ia64 compilation, though. inline assembly isn't supported on these platforms.

##### Share on other sites
Are you compiling in debug or release mode?

For example, try changing this:
to

Aliasing is evil, especially in cases where the compiler might be a bit shaky to begin with (such as SIMD code). So even simple stuff like the above might help.

I'd definitely avoid inline asm and stick with assembly. (For now, at least. If you profile it and find it to be too slow still, and you then write an inline ASM version, profile that and find it to be a lot faster, you might go with that)

##### Share on other sites
Quote:
 Original post by SpoonbenderAre you compiling in debug or release mode?Also, don't store your results back into vars you're using already.For example, try changing this:vec2 = _mm_add_ss(vec2,vec1);to__m128 vec3 = _mm_add_ss(vec2,vec1);Aliasing is evil, especially in cases where the compiler might be a bit shaky to begin with (such as SIMD code). So even simple stuff like the above might help.I'd definitely avoid inline asm and stick with assembly. (For now, at least. If you profile it and find it to be too slow still, and you then write an inline ASM version, profile that and find it to be a lot faster, you might go with that)

I'm compiling in release mode. I tried assigning to different vars but it still produces the same code. I'm not really using any of this code yet for anything yet since I'm just trying to learn SSE intrinsics but I was just curious why the compiler generates the extra movaps instructions. Does other compilers have these problems too?

1. 1
2. 2
Rutin
19
3. 3
JoeJ
16
4. 4
5. 5

• 36
• 23
• 13
• 13
• 17
• ### Forum Statistics

• Total Topics
631703
• Total Posts
3001814
×