Jump to content

  • Log In with Google      Sign In   
  • Create Account

Awesome job so far everyone! Please give us your feedback on how our article efforts are going. We still need more finished articles for our May contest theme: Remake the Classics

[HLSL, SM3.0] Flow control with nVidia hardware


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
4 replies to this topic

#1 user88   Members   -  Reputation: 228

Like
0Likes
Like

Posted 28 November 2011 - 07:34 AM

Hello there. Nice to see you again since long time i have wrote my last post.

I have a problem with HLSL shader that uses flow control under shader model 3.0 on new nVidia hardware that supports DX 11 (i.e. gForce 470 GTX). What is strange, that shader works good on oldest graphics adapters (240 GT, 270 GTX). Note, DX 11 has nothing to do with it, I use DX9 and just want to point you to the hardware generation which I have a problems with.

More in details: I use parallax mapping technique and lightning calculation loop in my fragment shader. I know that gradient-based functions of HLSL language cannot be used inside flow control statement, so i compile this shaders with D3DXSHADER_AVOID_FLOW_CONTROL flag. Also I use unroll attributes for cycles. I checked generated ASM code, no any branches detected. As i mentioned before this is working good on old nVidia graphics adapters. But when i run it on new ones object is blinking while I move or rotate a camera.

The brief of fragment shader:

  • Flow control macros:
// Define AVOID_FLOW_CONTROL
#ifdef EFFECT_ATTRIBUTE_BUMP_MAP
#define AVOID_FLOW_CONTROL
#endif

// if
#ifdef AVOID_FLOW_CONTROL
#define IF(condition) [flatten]if(condition)
#else
#define IF(condition) [branch]if(condition)
#endif

// for
#ifdef AVOID_FLOW_CONTROL
#define FOR(condition, max_loops) [unroll(max_loops)]for(condition)
#else
#define FOR(condition, max_loops) [loop]for(condition)
#endif

// while
#ifdef AVOID_FLOW_CONTROL
#define WHILE(condition, max_loops) [unroll(max_loops)]while(condition)
#else
#define WHILE(condition, max_loops) [loop]while(condition)
#endif

  • Parallax mapping based on nVidia SDK sample
IF ( fMipLevel <= (float) g_nLODThreshold )
   	{
  		//===============================================//
  		// Parallax occlusion mapping offset computation //
  		//===============================================//

  		// Utilize dynamic flow control to change the number of samples per ray 
  		// depending on the viewing angle for the surface. Oblique angles require 
  		// smaller step sizes to achieve more accurate precision for computing displacement.
  		// We express the sampling rate as a linear function of the angle between 
  		// the geometric normal and the view direction ray:
  		int nNumSteps = (int) lerp( g_nMaxSamples, g_nMinSamples, dot(-viewWS, normalWS));

  		// Intersect the view ray with the height field profile along the direction of
  		// the parallax offset ray (computed in the vertex shader. Note that the code is
  		// designed specifically to take advantage of the dynamic flow control constructs
  		// in HLSL and is very sensitive to specific syntax. When converting to other examples,
  		// if still want to use dynamic flow control in the resulting assembly shader,
  		// care must be applied.
  		// 
  		// In the below steps we approximate the height field profile as piecewise linear
  		// curve. We find the pair of endpoints between which the intersection between the 
  		// height field profile and the view ray is found and then compute line segment
  		// intersection for the view ray and the line segment formed by the two endpoints.
  		// This intersection is the displacement offset from the original texture coordinate.
  		// See the above paper for more details about the process and derivation.
  		//

  		float fCurrHeight = 0.0;
  		float fStepSize   = 1.0 / (float) nNumSteps;
  		float fPrevHeight = 1.0;
  		float fNextHeight = 0.0;

  		int	nStepIndex = 0;

  		float2 vTexOffsetPerStep = fStepSize * parallaxOffsetTS;
  		float2 vTexCurrentOffset = texCoord;
  		float  fCurrentBound 	= 1.0;
  		float  fParallaxAmount   = 0.0;

  		float2 pt1 = 0;
  		float2 pt2 = 0;
   		
  		float2 texOffset2 = 0;

  		WHILE(nStepIndex < nNumSteps, 20)
  		{
 			vTexCurrentOffset -= vTexOffsetPerStep;

 			// Sample height map which in this case is stored in the alpha channel of the normal map:
 			fCurrHeight = tex2Dgrad(gBumpMapSampler, vTexCurrentOffset, dx, dy).a;

 			fCurrentBound -= fStepSize;

 			IF ( fCurrHeight > fCurrentBound ) 
 			{   
				pt1 = float2( fCurrentBound, fCurrHeight );
				pt2 = float2( fCurrentBound + fStepSize, fPrevHeight );

				texOffset2 = vTexCurrentOffset - vTexOffsetPerStep;

				nStepIndex = nNumSteps + 1;
				fPrevHeight = fCurrHeight;
 			}
 			else
 			{
				nStepIndex++;
				fPrevHeight = fCurrHeight;
 			}
  		}   

  		float fDelta2 = pt2.x - pt2.y;
  		float fDelta1 = pt1.x - pt1.y;
  		
  		float fDenominator = fDelta2 - fDelta1;
  		
  		// SM 3.0 requires a check for divide by zero, since that operation will generate
  		// an 'Inf' number instead of 0, as previous models (conveniently) did:
  		IF (fDenominator == 0.0f) fParallaxAmount = 0.0f;
  		else fParallaxAmount = (pt1.x * fDelta2 - pt2.x * fDelta1 ) / fDenominator;
  		
  		float2 vParallaxOffset = parallaxOffsetTS * (1 - fParallaxAmount );

  		// The computed texture offset for the displaced point on the pseudo-extruded surface:
  		texSampleBase -= vParallaxOffset;
  		texCoord = texSampleBase;

  		// Lerp to bump mapping only if we are in between, transition section:    	
  		IF ( fMipLevel > (float)(g_nLODThreshold - 1) )
  		{
 			// Lerp based on the fractional part:
 			fMipLevelFrac = modf( fMipLevel, fMipLevelInt );

 			// Lerp the texture coordinate from parallax occlusion mapped coordinate to bump mapping
 			// smoothly based on the current mip level:
 			texCoord = lerp( texSampleBase, texCoord, fMipLevelFrac );
 		}

  • Lightning cycle
FOR (int i = 0; i < gNumberLights; i++, 8)
	{
		float diffuseIntensity = 1;

   	//////////////////////////////// Light Direction Calculation ///////////////////////////////////
				
		// calculate light vector
		float3 lightVector = gLights[i].Position - input.PositionWS;
		// calculate direction
		float3 direction;
		IF (gLights[i].Type == D3DLIGHT_POINT) direction = lightVector;
		else direction = -gLights[i].Direction;
		#ifdef EFFECT_ATTRIBUTE_BUMP_MAP
				direction = mul(mWorldToTangent, direction);
		#endif /*EFFECT_ATTRIBUTE_BUMP_MAP*/
		direction = normalize(direction);

		...
			
		IF (diffuseIntensity > 0)
		{
			/////////////////////////////////////// Diffuse ////////////////////////////////////////////////
			
			#ifdef EFFECT_ATTRIBUTE_BUMP_MAP
				diffuseIntensity *= dot(direction, vNormalTS);
			#elif defined(EFFECT_ATTRIBUTE_NORMAL) /*!EFFECT_ATTRIBUTE_BUMP_MAP*/
				diffuseIntensity *= dot(direction, normalWS);
			#endif /*EFFECT_ATTRIBUTE_BUMP_MAP*/
			
			diffuseIntensity = saturate(diffuseIntensity);
			diffuse += materialDiffuse * gLights[i].Color.rgb * diffuseIntensity.xxx * attenuation.xxx;
			
			/////////////////////////////////////// Specular ////////////////////////////////////////////////
			
			// calculate specular by Schlick simplified algorithm
			#ifdef EFFECT_ATTRIBUTE_NORMAL
				#ifndef EFFECT_ATTRIBUTE_BUMP_MAP
					float3 H = normalize(-direction + viewWS);
					float3 D = dot(-normalWS, H);
				#else /*!EFFECT_ATTRIBUTE_BUMP_MAP*/
					float3 D = saturate( dot( vReflectionTS, direction ));					
				#endif /*EFFECT_ATTRIBUTE_BUMP_MAP*/
				// TODO: remove Light.Size
				//float ls = saturate((MAX_LIGHT_SIZE - gLights[i].Size) / MAX_LIGHT_SIZE);
				float power = (/*ls + */gMaterial.Glossiness);
				float n = lerp(2.5, 100, power); // Specular Sharpness
				specular += materialSpecular * gLights[i].Color.rgb * D / (n.xxx - D * n.xxx + D) * attenuation.xxx;
			#endif /*EFFECT_ATTRIBUTE_NORMAL*/
		}
		
		...
	}



Does anybody guess what is the problem? Thank you for your time.

Sponsor:

#2 Nik02   Members   -  Reputation: 1991

Like
0Likes
Like

Posted 28 November 2011 - 07:50 AM

If the reference rasterizer displays the effect correctly, blame the drivers.

The floating-point NAN and infinity rules should be the same for all the cards mentioned. This could be the source of the problem if the cards were from totally different eras.
Niko Suni
Software developer

#3 user88   Members   -  Reputation: 228

Like
0Likes
Like

Posted 28 November 2011 - 08:09 AM

Just finished an experiment. Reference rasterizer displays the effect correctly..

#4 user88   Members   -  Reputation: 228

Like
0Likes
Like

Posted 28 November 2011 - 10:51 AM

Interesting thing is that even ParallaxOcclusionMapping sample from DirectX SDK works wrong when Specular is ON and Hardware Vertex Processing is used.

#5 MJP   Moderators   -  Reputation: 5419

Like
0Likes
Like

Posted 28 November 2011 - 11:42 AM

I've hit plenty of driver bugs on Nvidia hardware, especially with shaders that have loops in them. I even once had a shader that caused the driver to crash! It can be tricky to pinpoint the exact part of the shader causing the problem, but once you do you can usually make a small change or two to avoid the bug.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS