• Create Account

Banner advertising on our site currently available from just \$5!

Tapped

Member Since 29 Jan 2011
Offline Last Active Feb 03 2014 01:40 PM

Vec4D: SSE-ASM, SSE-INTRINSICS, NORMAL

21 April 2013 - 07:35 AM

I have played around with SSE. I am using VS2012, and I wanted to know what was the fastest way to calculate the length of a vector.

```float Magnitude() const
{
#if SSE && SSE_ASM
float result;
//Optimized magnitude calculation with SSE and Assembly
__asm
{
MOV EAX, this								//Move [this] to EAX.
MOVAPS XMM2, [EAX]							//Copy data EAX to XMM2 register
MULPS  XMM2, XMM2                            //Square the XMM2 register.
MOVAPS XMM1, XMM2                            //Make a copy
SHUFPS XMM2, XMM1, _MM_SHUFFLE(1, 0, 3, 2)   //Shuffle so that we can add together the elements.
MOVAPS XMM1, XMM2			                //Make a copy
SHUFPS XMM1, XMM1, _MM_SHUFFLE(0, 1, 0, 1)   //Second addition of elements using shuffle
SQRTPS XMM2, XMM2			                //Get the square root
MOVSS [result], XMM2                         //Store the result in the float.
}
return result;
#elif SSE
__m128 tmp = _mm_mul_ps(components, components);
tmp = _mm_add_ps(_mm_shuffle_ps(tmp, tmp, _MM_SHUFFLE(1, 0, 3, 2)), tmp);
tmp = _mm_sqrt_ps(_mm_add_ps(tmp, _mm_shuffle_ps(tmp, tmp, _MM_SHUFFLE(0, 1, 0, 1))));

float result;
_mm_store_ss(&result, tmp);

return result;
#endif

#if !SSE && !SSE_ASM
return sqrtf(__x * __x + __y * __y + __z * __z + __w * __w);
#endif
}
```

Guess what, the normal method was as fast as the SSE-Intrinsics, while my assembly code was acctually slower.
So yeah, you can't beat the compiler

Marching Cube Holes

14 September 2012 - 06:38 AM

I have problem generating terrain in marching cubes. My marching cube implementation is in OpenCL. I have 16 * 16 * 16 voxels per block. It works fine when generating a sphere, the only problem is that you can see some small holes between some parts of the sphere. My theory is floating precision, but i dont know.

http://imageshack.us/photo/my-images/833/failsphere.png/

Another problem occur when i tries to generate terrain, with noise:

http://imageshack.us...failterrain.png

As you can see, it does not look well. This was rendered with isolevel 0.9f.

I create the terrain like:
[source lang="cpp"]sampler_t randomVolumeSampler = CLK_NORMALIZED_COORDS_TRUE | CLK_ADDRESS_REPEAT | CLK_FILTER_LINEAR;density.x = wPos.y; density.x += read_imagef(randomVolume, randomVolumeSampler, realGridPos).x * 0.25f;[/source]

So i was wondering, if i have implemented marching cube wrong, or if it behaves right,

PARTNERS