Quote:Original post by exorcist_bob
Why is the constructor being called in this code?
*** Source Snippet Removed ***
Thanks,
exorcist_bob
Because you create a temporary Matrix3 object?
Quote:Original post by exorcist_bob
Why is the constructor being called in this code?
*** Source Snippet Removed ***
Thanks,
exorcist_bob
Quote:Original post by iMalc
If you're intersted in speed, and perhaps a learning challenge, then how about simply looking into expression templates rather than getting down and dirty with asm?
Quote:Original post by exorcist_bob
As you can see, not a ton of overhead. Now here is my version using SSE:
*** Source Snippet Removed ***
Straight from the dissassembly output in visual c++. The original source:
*** Source Snippet Removed ***
*** Source Snippet Removed ***
Heck, my fpu version is twice as slow, and not much different that the c++ version. So, I deduce that it must be overhead issues. Which brings me to the conclusion that I should use MASM and link it with my dll.
row0 = _mm_load_ps(m_fMatrix9); mov eax,dword ptr [ebp-0Ch] movaps xmm0,xmmword ptr [eax] movaps xmmword ptr [ebp-3B0h],xmm0 movaps xmm0,xmmword ptr [ebp-3B0h] movaps xmmword ptr [ebp-70h],xmm0 row1 = _mm_load_ps(m_fMatrix9+4); mov eax,dword ptr [ebp-0Ch] movaps xmm0,xmmword ptr [eax+10h] movaps xmmword ptr [ebp-390h],xmm0 movaps xmm0,xmmword ptr [ebp-390h] movaps xmmword ptr [ebp-90h],xmm0 row2 = _mm_load_ss(m_fMatrix9+8); mov eax,dword ptr [ebp-0Ch] movss xmm0,dword ptr [eax+20h] movaps xmmword ptr [ebp-370h],xmm0 movaps xmm0,xmmword ptr [ebp-370h] movaps xmmword ptr [ebp-0B0h],xmm0 //row3 = _mm_load_ps(m_fMatrix16+12); base0 = _mm_load_ps(mat.m_fMatrix9); mov eax,dword ptr [ebx+0Ch] movaps xmm0,xmmword ptr [eax] movaps xmmword ptr [ebp-350h],xmm0 movaps xmm0,xmmword ptr [ebp-350h] movaps xmmword ptr [ebp-0F0h],xmm0 base1 = _mm_load_ps(mat.m_fMatrix9+4); mov eax,dword ptr [ebx+0Ch] movaps xmm0,xmmword ptr [eax+10h] movaps xmmword ptr [ebp-330h],xmm0 movaps xmm0,xmmword ptr [ebp-330h] movaps xmmword ptr [ebp-110h],xmm0 base2 = _mm_load_ss(mat.m_fMatrix9+8); mov eax,dword ptr [ebx+0Ch] movss xmm0,dword ptr [eax+20h] movaps xmmword ptr [ebp-310h],xmm0 movaps xmm0,xmmword ptr [ebp-310h] movaps xmmword ptr [ebp-130h],xmm0 //base3 = _mm_load_ps(mat.m_fMatrix16+12); result0 = _mm_add_ps(row0, base0); movaps xmm0,xmmword ptr [ebp-0F0h] movaps xmm1,xmmword ptr [ebp-70h] addps xmm1,xmm0 movaps xmmword ptr [ebp-2F0h],xmm1 movaps xmm0,xmmword ptr [ebp-2F0h] movaps xmmword ptr [ebp-170h],xmm0 result1 = _mm_add_ps(row1, base1); movaps xmm0,xmmword ptr [ebp-110h] movaps xmm1,xmmword ptr [ebp-90h] addps xmm1,xmm0 movaps xmmword ptr [ebp-2D0h],xmm1 movaps xmm0,xmmword ptr [ebp-2D0h] movaps xmmword ptr [ebp-190h],xmm0 result2 = _mm_add_ss(row2, base2); movaps xmm0,xmmword ptr [ebp-130h] movaps xmm1,xmmword ptr [ebp-0B0h] addss xmm1,xmm0 movaps xmmword ptr [ebp-2B0h],xmm1 movaps xmm0,xmmword ptr [ebp-2B0h] movaps xmmword ptr [ebp-1B0h],xmm0 //result3 = _mm_add_ps(row3, base3); _mm_store_ps(matResult.m_fMatrix9, result0); movaps xmm0,xmmword ptr [ebp-170h] movaps xmmword ptr [ebp-50h],xmm0 _mm_store_ps(matResult.m_fMatrix9+4,result1); movaps xmm0,xmmword ptr [ebp-190h] movaps xmmword ptr [ebp-40h],xmm0 _mm_store_ss(matResult.m_fMatrix9+8,result2); movaps xmm0,xmmword ptr [ebp-1B0h] movss dword ptr [ebp-30h],xmm0
Quote:Original post by exorcist_bob
ASM straight from the dissassembly. As you can see, the data is shuffled around A LOT. Wouldn't memory latency take into effect this away?
movaps xmmword ptr [ebp-390h],xmm0 movaps xmm0,xmmword ptr [ebp-390h]
is quite funny. Folks, (get a skilled human to) write your time-critical parts in asm, and life is good.
Quote:Original post by exorcist_bob
Well, when I ran the program using intrisics, I get speeds around in the middle between C++ speed and my old speed. I have no idea why all the extra data shuffling is actually speeding it up.
*** Source Snippet Removed ***
ASM straight from the dissassembly. As you can see, the data is shuffled around A LOT. Wouldn't memory latency take into effect this away?
Thanks,
exorcist_bob