• Create Account

Banner advertising on our site currently available from just \$5!

### #ActualimoogiBG

Posted 23 October 2013 - 03:33 PM

Actually the topic name should be "How to profile code".

Well I am writing SIMD math library. I got 2 implementations SSE and scalar.

I'm not shure how measure the code speed. Currently Im not using optimization, and no debug symbols are generated for profiling.

I'm creating a loop that repeats the operation...

The compiler is cl

I'm expecting SSE dot product to be slower than scalar version?

But the cross product is also slower!?!@

SGE_FORCE_INLINE SGVector vec3_cross(const SGVector& a, const SGVector& b)
{
#if defined(SGE_MATH_USE_SSE)
__m128 T = _mm_shuffle_ps(a.m_M128, a.m_M128, SGE_SIMD_SHUFFLE(1, 2, 0, 3)); //(Y Z X 0)
__m128 V = _mm_shuffle_ps(b.m_M128, b.m_M128, SGE_SIMD_SHUFFLE(1, 2, 0, 3)); //(Y Z X 0)

//i(ay*bz - by*az)  + j(bx*az - ax*bz)  + k(ax*by - bx*ay)
T = _mm_mul_ps(T, b.m_M128);//bx * ay, by * az, bz * ax
V = _mm_mul_ps(V, a.m_M128);//ax * by, ay * bz, az * bx
V = _mm_sub_ps(V, T);

V = _mm_shuffle_ps(V, V, SGE_SIMD_SHUFFLE(1, 2, 0, 3));
return SGVector(V);
#else
const float x = (a.y*b.z) - (b.y*a.z);
const float y = (b.x*a.z) - (a.x*b.z);
const float z = (a.x*b.y) - (b.x,a.y);

return SGVector(x, y, z, 0.f);
#endif
}

where SGVector is struct with union{ struct {float x,y,z;}; float arr[4]; __m128 m_M128}. (maybe that is the problem?!)

EDIT : maybe __forceinline is involed too!? I will remove it.

### #2imoogiBG

Posted 23 October 2013 - 03:32 PM

Actually the topic name should be "How to profile code".

Well I am writing SIMD math library. I got 2 implementations SSE and scalar.

I'm not shure how measure the code speed. Currently Im not using optimization, and no debug symbols are generated for profiling.

I'm creating a loop that repeats the operation...

The compiler is cl

I'm expecting SSE dot product to be slower than scalar version?

But the cross product is also slower!?!@

SGE_FORCE_INLINE SGVector vec3_cross(const SGVector& a, const SGVector& b)
{
#if defined(SGE_MATH_USE_SSE)
__m128 T = _mm_shuffle_ps(a.m_M128, a.m_M128, SGE_SIMD_SHUFFLE(1, 2, 0, 3)); //(Y Z X 0)
__m128 V = _mm_shuffle_ps(b.m_M128, b.m_M128, SGE_SIMD_SHUFFLE(1, 2, 0, 3)); //(Y Z X 0)

//i(ay*bz - by*az)  + j(bx*az - ax*bz)  + k(ax*by - bx*ay)
T = _mm_mul_ps(T, b.m_M128);//bx * ay, by * az, bz * ax
V = _mm_mul_ps(V, a.m_M128);//ax * by, ay * bz, az * bx
V = _mm_sub_ps(V, T);

V = _mm_shuffle_ps(V, V, SGE_SIMD_SHUFFLE(1, 2, 0, 3));
return SGVector(V);
#else
const float x = (a.y*b.z) - (b.y*a.z);
const float y = (b.x*a.z) - (a.x*b.z);
const float z = (a.x*b.y) - (b.x,a.y);

return SGVector(x, y, z, 0.f);
#endif
}

where SGVector is struct with union{ struct {float x,y,z;}; float arr[4]; __m128 m_M128}. (maybe that is the problem?!)

EDIT : maybe __froceinline is involed too!? I will remove it.

### #1imoogiBG

Posted 23 October 2013 - 03:30 PM

Actually the topic name should be "How to profile code".

Well I am writing SIMD math library. I got 2 implementations SSE and scalar.

I'm not shure how measure the code speed. Currently Im not using optimization, and no debug symbols are generated for profiling.

I'm creating a loop that repeats the operation...

The compiler is cl

I'm expecting SSE dot product to be slower than scalar version?

But the cross product is also slower!?!@

SGE_FORCE_INLINE SGVector vec3_cross(const SGVector& a, const SGVector& b)
{
#if defined(SGE_MATH_USE_SSE)
__m128 T = _mm_shuffle_ps(a.m_M128, a.m_M128, SGE_SIMD_SHUFFLE(1, 2, 0, 3)); //(Y Z X 0)
__m128 V = _mm_shuffle_ps(b.m_M128, b.m_M128, SGE_SIMD_SHUFFLE(1, 2, 0, 3)); //(Y Z X 0)

//i(ay*bz - by*az)  + j(bx*az - ax*bz)  + k(ax*by - bx*ay)
T = _mm_mul_ps(T, b.m_M128);//bx * ay, by * az, bz * ax
V = _mm_mul_ps(V, a.m_M128);//ax * by, ay * bz, az * bx
V = _mm_sub_ps(V, T);

V = _mm_shuffle_ps(V, V, SGE_SIMD_SHUFFLE(1, 2, 0, 3));
return SGVector(V);
#else
const float x = (a.y*b.z) - (b.y*a.z);
const float y = (b.x*a.z) - (a.x*b.z);
const float z = (a.x*b.y) - (b.x,a.y);

return SGVector(x, y, z, 0.f);
#endif
}

where SGVector is struct with union{ struct {float x,y,z;}; float arr[4]; __m128 m_M128}. (maybe that is the problem?!)

PARTNERS