Jump to content
  • Advertisement
Sign in to follow this  
ph33r

My Terain's multi-texture pixel shader is killing me!(FPS)

This topic is 4883 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm using a pixel shade to do custom mutli-texturing. What I'm doing is using a texture map This Texture map is then sampled 4x in the pixel shader and added to the pixel to give this result on my terrain. Each vertex on my terrain can have one texture assigned to it. If a quad of vertices each has a different texture it will interpolate between the different textures within the texturemap. Problem: I'm gettin 20 FPS in release mode. If I change the pixel shader to output a constant color it jumps to 200 fps. So I've found the bottleneck. Just not sure how to optimize it. Here's my code for the pixel shader.
//	The terrain's Pixel shader works on sampling from a texture map. Which is 4 textures
// combined into one 512x512 texture map.
//
// The vertices look like this
// *-*
// |\|
// *-*
//
// Tex0 is the U,V's for the texture map in the format
// 0,0  1,0
// 0,1  1,1
//
float4 TerrainPS( float4 Normal : TEXCOORD0, float2 Tex0 : TEXCOORD1,
											 float2 Tex1 : TEXCOORD2,
											 float2 Tex2 : TEXCOORD3 ) : COLOR
{
	// This calculates all the texture coordinates
	// To find the first texture coordinate, simply divide by two.
	// The others you need to offset by 0.5f to get their correct index.
	float2 texCoord0 = Tex0 / 2.0f;
	
	float2 texCoord1 = texCoord0;
	texCoord1.x += 0.5f;
	
	float2 texCoord2 = texCoord0;
	texCoord2.y += 0.5f;
	
	float2 texCoord3 = texCoord0;
	texCoord3.x += 0.5f;
	texCoord3.y += 0.5f;
	
	
	// Sample the texture map by the correct texture coords.
	float4 texColor0 = tex2D( S0, texCoord2 );
	float4 texColor1 = tex2D( S0, texCoord1 );
	float4 texColor2 = tex2D( S0, texCoord0 );
	float4 texColor3 = tex2D( S0, texCoord3 );
	
	// This is the multi-texturing stage process.
	//
	// Tex1 and Tex2 represent the four textures
	// Texture 0 == Tex1.x
	// Texture 1 == Tex1.y
	// Texture 2 == Tex2.x
	// Texture 3 == Tex2.y
	//
	// They are set to 1 at the vertex if you painted a texture there
	// Otherwise they are set to 0. This lets the texture fade across.
	texColor0 *= Tex1.x;
	texColor1 *= Tex1.y;
	texColor2 *= Tex2.x;
	texColor3 *= Tex2.y;
	
	// Add them together for additive blending.
	float4 texColor = texColor0 + texColor1 + texColor2 + texColor3;

	// Note Normal is the diffuse component calculated in the vertex shader.
	return texColor * Normal; //+ ambientMtrl * texColor;
}


Share this post


Link to post
Share on other sites
Advertisement
BTW- I've moved calculating the four different textured coordinates in the pixel shader up to the vertex shader and I get the same FPS results.

I tried sampling once and I got something like 80 FPS which isn't too bad. But I need to sample 4 times.

Share this post


Link to post
Share on other sites
I reduced the texture map to 128x128 and it gave a small boost in FPS. Though the quality really shows.

Share this post


Link to post
Share on other sites
Why can't you calculate texcoord1->texcoord3 in the vertex pipeline and interpolate them through regular texture coordinate indices? Adding the same constant for each pixel is totally redundant...

I mean, hell, it looks like what you're doing there could be achieved using the fixed function pixel pipeline. Accumulate four texture reads using coords generated in the vertex shader, followed by a multiply by the interpolated diffuse value.

Share this post


Link to post
Share on other sites
My terrain currently does a similar type of blending, only the blend factors are stored per vertex instead of in a texture map. But I get the same problem nonetheless.

The problem is you are burning up fill rate. What class video card are you using? On my geForce 3 (4 simultaneous texture units) I get around 80-100 FPS when rendering 65,000 triangles of terrain @ 640x480. But the bottle neck is definitely sampling all those textures. You can test by just changing the device resolution and see if it has a big impact on performance.

The only real solution that I have come across is to just use one large color texture over the whole terrain with a detail texture to reduce the blurriness. You will need to figure out how to reduce the number of textures that you are using.

Hope this helps.

Share this post


Link to post
Share on other sites
Quote:
Original post by superpig
Why can't you calculate texcoord1->texcoord3 in the vertex pipeline and interpolate them through regular texture coordinate indices? Adding the same constant for each pixel is totally redundant...


Yah I just did that - though it didn't really change the FPS by much more then a few FPS.

Here's my updated code.


//
// The terrain's Pixel shader works on sampling from a texture map. Which is 4 textures
// combined into one 512x512 texture map.
//
// The vertices look like this
// *-*
// |\|
// *-*
//
// Tex0 is the U,V's for the texture map in the format
// 0,0 1,0
// 0,1 1,1
//
float4 TerrainPS( float4 Normal : TEXCOORD0, float2 Tex0 : TEXCOORD1,
float2 Tex1 : TEXCOORD2,
float2 Tex2 : TEXCOORD3,
float2 Tex3 : TEXCOORD4, // Tex coordinate 0
float2 Tex4 : TEXCOORD5, // Tex coordinate 1
float2 Tex5 : TEXCOORD6, // Tex coordinate 2
float2 Tex6 : TEXCOORD7 ) : COLOR // coordinate 3
{
// Sample the texture map by the correct texture coords.
float4 texColor0 = tex2D( S0, Tex5 );
float4 texColor1 = tex2D( S0, Tex4 );
float4 texColor2 = tex2D( S0, Tex3 );
float4 texColor3 = tex2D( S0, Tex6 );

// This is the multi-texturing stage process.
//
// Tex1 and Tex2 represent the four textures
// Texture 0 == Tex1.x
// Texture 1 == Tex1.y
// Texture 2 == Tex2.x
// Texture 3 == Tex2.y
//
// They are set to 1 at the vertex if you painted a texture there
// Otherwise they are set to 0. This lets the texture fade across.
texColor0 *= Tex1.x;
texColor1 *= Tex1.y;
texColor2 *= Tex2.x;
texColor3 *= Tex2.y;

// Add them together for additive blending.
float4 texColor = texColor0 + texColor1 + texColor2 + texColor3;

// Note Normal is the diffuse component calculated in the vertex shader.
return texColor * Normal; //+ ambientMtrl * texColor;
}



p.s. To jason, I'd love to have just one colored texture across the terrain but as per requirements I must be able to paint four different textures on to the terrain.

Share this post


Link to post
Share on other sites
What hardware are you testing it on?

It'd be worth trying to get a look at the assembly it generates, too. If I were writing this in assembly I'd write this:


ps.1.1
tex t0
tex t1
tex t2
tex t3
add r0, t0, t1
add r0, r0, t2
add r0, r0, t3
mul r0, r0, v0



That assumes the texture is bound to four seperate stages, of course - as far as I know, multiple reads from the same texture stage ain't a good thing.

Share this post


Link to post
Share on other sites
That assembly shader does not perform the same operations that ph33r's HLSL does, specifically there is no account for the optional blending of textures - you need mads for that.

I used fxc to generate the assembly output from ph33r's shader, and it's 4 texture instructions followed by 5 arithmatic. That's not a big requirement really. I'll have to think about it some more before I come to any conclusions.

-Mezz

Share this post


Link to post
Share on other sites
Quote:
Original post by superpig
What hardware are you testing it on?

ATI Radeon 9600 mobility.

Quote:
It'd be worth trying to get a look at the assembly it generates, too.


//
// Generated by Microsoft (R) D3DX9 Shader Compiler 5.04.00.2904
//
// Parameters:
//
// sampler2D Texture0;
//
//
// Registers:
//
// Name Reg Size
// ------------ ----- ----
// Texture0 s0 1
//

ps_2_0
dcl t0
dcl t2.xy
dcl t3.xy
dcl t4.xy
dcl t5.xy
dcl t6.xy
dcl t7.xy
dcl_2d s0
texld r3, t5, s0
texld r2, t6, s0
texld r1, t4, s0
texld r0, t7, s0
mul r3, r3, t2.y
mad r2, r2, t2.x, r3
mad r1, r1, t3.x, r2
mad r0, r0, t3.y, r1
mul r0, r0, t0
mov oC0, r0

// approximately 10 instruction slots used (4 texture, 6 arithmetic)




Quote:
That assumes the texture is bound to four seperate stages, of course - as far as I know, multiple reads from the same texture stage ain't a good thing.

The texture is only bound to one stage. I didn't know there would be a benefit from putting one texture on multiple stages. I assumed just using one texture one stage and four reads from it would be good.

Quote:
Original post by MezzI'll have to think about it some more before I come to any conclusions.

Thanks mezz.

Share this post


Link to post
Share on other sites
Quote:
Original post by Mezz
That assembly shader does not perform the same operations that ph33r's HLSL does, specifically there is no account for the optional blending of textures - you need mads for that.
Oh, whoops, I missed that in the original HLSL code.

I don't have the hardware to play with ps_2_0 stuff so I've not been able to test whether having the texture bound to four seperate samplers is faster than reusing one. You'd have to profile it. Ah - if you've got NVidia hardware, have you tried using NVPerfHUD?

I don't think there's anything here that actually requires ps_2_0 though.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!