Advertisement Jump to content
Sign in to follow this  

Is it normal for texture coordinate scaling to Slow down pixel shader

This topic is 1166 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts


I found a bottle neck or some thing.

When I scale the text coords say by 22.3 in the pixel shader(4) per channel(R, G, B, A). The frame per millisecond increases by 80 milliseconds.

the texture size's are 512x512.


this is from the pixel shader.

float3 c0 = gTex0.Sample(Tex0S, In.tiledTexC * gTexScaleRed).rgb;


Ah. good time to have a thought, Why not pass the 4 calculated text coord to the PS(input in add 4 more value. I'll try that now.


That got me 1 millisecond.


What other things could I do in here to improve performance.


	PS_OUTPUT pout;

//float4 diffuse = gBlendMap.Sample(BlendMapS, In.nonTiledTexC );
	// Kill transparent pixels.
//	clip(diffuse.a - 0.15f);

    //float3 c0 = gTex0.Sample(Tex0S, In.tiledTexC * gTexScaleRed).rgb;
   // float3 c1 = gTex1.Sample(Tex1S, In.tiledTexC * gTexScaleGreen).rgb;
    //float3 c2 = gTex2.Sample(Tex2S, In.tiledTexC * gTexScaleBlue).rgb;
    //float3 c3 = gTex3.Sample(Tex3S, In.tiledTexC * gTexScaleAlpha).rgb;

	float3 c0 = gTex0.Sample(Tex0S, In.tiledTexC).rgb;
    float3 c1 = gTex1.Sample(Tex1S, In.tiledTexCGreen).rgb;
    float3 c2 = gTex2.Sample(Tex2S, In.tiledTexCBlue).rgb;
    float3 c3 = gTex3.Sample(Tex3S, In.tiledTexCAlpha).rgb;

//shadow stuff
	float shadowFactor = CalcShadowFactor( In.projTexC);

// Blendmap is not tiled.
   float4 B = gBlendMap.Sample(BlendMapS,In.nonTiledTexC).rgba;//tex2D(BlendMapS, nonTiledTexC).rgb;

   // Find the inverse of all the blend weights so that we can
   // scale the total color to the range [0, 1].
   float totalInverse = 1.0f / (B.r + B.g + B.b + B.a);
    // Scale the colors by each layer by its corresponding weight
    // stored in the blendmap.  
    c0 *= B.r * totalInverse;
    c1 *= B.g * totalInverse;
    c2 *= B.b * totalInverse;
    c3 *= B.a * totalInverse;

    // Sum the colors and modulate with the shade to brighten/darken.
   float3 texColor = ((c0 + c1 + c2 + c3) * (In.shade * shadowFactor ))  ;

 //projected text pos stuff
 // Complete projection by doing division by w. /= In.DecalprojTexC.w;

	// Points outside the light volume are in shadow.
	if( In.DecalprojTexC.x < -1.0f || In.DecalprojTexC.x > 1.0f || 
	    In.DecalprojTexC.y < -1.0f || In.DecalprojTexC.y > 1.0f )
		//texColor *= 1.5f;
			// Transform from NDC space to texture space.
			In.DecalprojTexC.x = +0.5f*In.DecalprojTexC.x + 0.5f;
			In.DecalprojTexC.y = -0.5f*In.DecalprojTexC.y + 0.5f;
			float2 texelPos = In.DecalprojTexC.xy;//input.projTexC.xy;
			// Sample projected tex map. 
			float4 s0 = gDecalProjectiveMap.Sample(gDecalProjectiveSam, texelPos);
			//clip if black
			if(s0.a > 0.15f)// 
				//blend the 2 images
				//return lerp(float4(texColor, 0.5f), s0, 0.8f);//gDecalToDiffuseBlendAmt);
				pout.RGBColor     = lerp(float4(texColor, 0.5f), s0, 0.8f);
				pout.RGBColor1  =  float4(s0.rgb * 0.2,0.5);//pout.RGBColor;
				return pout;
			//else texColor *= 0.2f;
	}//end in the projection

pout.RGBColor     = (float4(texColor, 1.0f));
pout.RGBColor1  =  float4(0,0,0,1);//pout.RGBColor;
return pout;


Edited by ankhd

Share this post

Link to post
Share on other sites

Like Hodgman suggested, you probably need mipmaps. Without them, adjacent screen pixels might be sampling from texels that are far apart in the texture, and thus far apart in texture memory. The GPU has a relatively small texture cache, and so you could be blowing the texture cache with each pixel you render, causing the GPU to stall while it pulls in the new part of the texture into the cache.



Also, on some mobile GPUs, performing *any* calculations on texture coordinates in the pixel shader can cause some performance hit (I've encountered this on iOS devices, for instance). Probably doesn't apply in your case, and I would imagine this would be relatively minor compared to the texture bandwidth issue caused by not using mipmaps.

Share this post

Link to post
Share on other sites


Yeah I had no mipmaps. I set them up and gained about 1ms when viewing from high up and at lower lvl's the frame rate gain was about 1 to 2 ms. 

Do mipmaps make it look smoother or is it because I had 

loadInfo.Filter = D3DX10_FILTER_NONE;

loadInfo.MipFilter = D3DX10_FILTER_NONE;


Now have both set to   D3DX10_FILTER_LINEAR for both and mip level set to D3DX10_DEFAULT D3DX10CreateTextureFromFile will create them.


But the frame rate was still high so I fired up pix. when I was stepping through I noticed 168 draw calls for terrain chunk's WTH there you go. I Had the terrain chunks set way too small, Increased the size of each chunk by 3 and that decreased them to 9 terrain chunk's, That made all the difference from 145 ms to 45 ms Wow.

Strange how I only noticed when I created a new map and incresased the text scale.

Share this post

Link to post
Share on other sites

168 draw calls might be overkill for drawing some terrain chunks, but I wouldn't expect it to cause that kind of performance hit. A complex scene will have more than that. Something else must be happening.

Share this post

Link to post
Share on other sites


Not sure but it was 168 chunks in a std::List<>  that fit in the camera frustum, That's a loop of 168 buffers every frame, it was made up of small mesh segments. for some reason I can't judge size in the 3D world very good.

The index buffer before had only 4000 indices, I up that to 98304 indices and gained extra 100 ms. Now depending on the camera position there is less but more  vertices.


 I also made a change's when applying the shader values from setting every draw call while processing each chunks to setting once then render all chunks with that data.

They share the same data anyway.


The frame time was minor, Which made me look into the chunks.


Summing up, What I did was first do the mipmap thing then ran the app noticed some improvements, But still high frame time.

Then changed setting shared shader values each iteration to just once that's 7 textures and pos light and some, little frame improvement, frame time still in the 111 to 135 range 5 to 10ms for that but down from 145 the original time per frame.

Then I increased the chunk size which gave the most hit of 35 ms per frame.

I could get some more from changing the std::list to a std::vector but that means a bit of rework(allot)for now the times per frame are good.

There was a good reason but I can't think why I used a list and not a vector. created terrain stuff over 7 years ago by date. not working hours this is part time.

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement

Important Information

By using, you agree to our community Guidelines, Terms of Use, and Privacy Policy. is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!