Jump to content

  • Log In with Google      Sign In   
  • Create Account


Tapped

Member Since 29 Jan 2011
Offline Last Active Feb 03 2014 01:40 PM

Topics I've Started

Vec4D: SSE-ASM, SSE-INTRINSICS, NORMAL

21 April 2013 - 07:35 AM

I have played around with SSE. I am using VS2012, and I wanted to know what was the fastest way to calculate the length of a vector.
The code looks very bad.
 

float Magnitude() const
{
#if SSE && SSE_ASM
	float result;
	//Optimized magnitude calculation with SSE and Assembly
	__asm
	{
	MOV EAX, this								//Move [this] to EAX.
	MOVAPS XMM2, [EAX]							//Copy data EAX to XMM2 register
	MULPS  XMM2, XMM2                            //Square the XMM2 register.
	MOVAPS XMM1, XMM2                            //Make a copy
	SHUFPS XMM2, XMM1, _MM_SHUFFLE(1, 0, 3, 2)   //Shuffle so that we can add together the elements.
	ADDPS  XMM2, XMM1			                //Add the elements.
	MOVAPS XMM1, XMM2			                //Make a copy
	SHUFPS XMM1, XMM1, _MM_SHUFFLE(0, 1, 0, 1)   //Second addition of elements using shuffle
	ADDPS  XMM2, XMM1
	SQRTPS XMM2, XMM2			                //Get the square root
	MOVSS [result], XMM2                         //Store the result in the float.
	}
	return result;
#elif SSE
	__m128 tmp = _mm_mul_ps(components, components); 
	tmp = _mm_add_ps(_mm_shuffle_ps(tmp, tmp, _MM_SHUFFLE(1, 0, 3, 2)), tmp);
	tmp = _mm_sqrt_ps(_mm_add_ps(tmp, _mm_shuffle_ps(tmp, tmp, _MM_SHUFFLE(0, 1, 0, 1))));
		
	float result;
	_mm_store_ss(&result, tmp);
		
	return result;
#endif

#if !SSE && !SSE_ASM
        return sqrtf(__x * __x + __y * __y + __z * __z + __w * __w);
#endif
}

 


Guess what, the normal method was as fast as the SSE-Intrinsics, while my assembly code was acctually slower.
So yeah, you can't beat the compiler :)
 


Marching Cube Holes

14 September 2012 - 06:38 AM

I have problem generating terrain in marching cubes. My marching cube implementation is in OpenCL. I have 16 * 16 * 16 voxels per block. It works fine when generating a sphere, the only problem is that you can see some small holes between some parts of the sphere. My theory is floating precision, but i dont know.

http://imageshack.us/photo/my-images/833/failsphere.png/

Another problem occur when i tries to generate terrain, with noise:

http://imageshack.us...failterrain.png

As you can see, it does not look well. This was rendered with isolevel 0.9f.

I create the terrain like:
[source lang="cpp"]sampler_t randomVolumeSampler = CLK_NORMALIZED_COORDS_TRUE | CLK_ADDRESS_REPEAT | CLK_FILTER_LINEAR;density.x = wPos.y; density.x += read_imagef(randomVolume, randomVolumeSampler, realGridPos).x * 0.25f;[/source]

So i was wondering, if i have implemented marching cube wrong, or if it behaves right,

PARTNERS