# My Terain's multi-texture pixel shader is killing me!(FPS)

This topic is 5125 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I'm using a pixel shade to do custom mutli-texturing. What I'm doing is using a texture map This Texture map is then sampled 4x in the pixel shader and added to the pixel to give this result on my terrain. Each vertex on my terrain can have one texture assigned to it. If a quad of vertices each has a different texture it will interpolate between the different textures within the texturemap. Problem: I'm gettin 20 FPS in release mode. If I change the pixel shader to output a constant color it jumps to 200 fps. So I've found the bottleneck. Just not sure how to optimize it. Here's my code for the pixel shader.
//	The terrain's Pixel shader works on sampling from a texture map. Which is 4 textures
// combined into one 512x512 texture map.
//
// The vertices look like this
// *-*
// |\|
// *-*
//
// Tex0 is the U,V's for the texture map in the format
// 0,0  1,0
// 0,1  1,1
//
float4 TerrainPS( float4 Normal : TEXCOORD0, float2 Tex0 : TEXCOORD1,
float2 Tex1 : TEXCOORD2,
float2 Tex2 : TEXCOORD3 ) : COLOR
{
// This calculates all the texture coordinates
// To find the first texture coordinate, simply divide by two.
// The others you need to offset by 0.5f to get their correct index.
float2 texCoord0 = Tex0 / 2.0f;

float2 texCoord1 = texCoord0;
texCoord1.x += 0.5f;

float2 texCoord2 = texCoord0;
texCoord2.y += 0.5f;

float2 texCoord3 = texCoord0;
texCoord3.x += 0.5f;
texCoord3.y += 0.5f;

// Sample the texture map by the correct texture coords.
float4 texColor0 = tex2D( S0, texCoord2 );
float4 texColor1 = tex2D( S0, texCoord1 );
float4 texColor2 = tex2D( S0, texCoord0 );
float4 texColor3 = tex2D( S0, texCoord3 );

// This is the multi-texturing stage process.
//
// Tex1 and Tex2 represent the four textures
// Texture 0 == Tex1.x
// Texture 1 == Tex1.y
// Texture 2 == Tex2.x
// Texture 3 == Tex2.y
//
// They are set to 1 at the vertex if you painted a texture there
// Otherwise they are set to 0. This lets the texture fade across.
texColor0 *= Tex1.x;
texColor1 *= Tex1.y;
texColor2 *= Tex2.x;
texColor3 *= Tex2.y;

float4 texColor = texColor0 + texColor1 + texColor2 + texColor3;

// Note Normal is the diffuse component calculated in the vertex shader.
return texColor * Normal; //+ ambientMtrl * texColor;
}



##### Share on other sites
BTW- I've moved calculating the four different textured coordinates in the pixel shader up to the vertex shader and I get the same FPS results.

I tried sampling once and I got something like 80 FPS which isn't too bad. But I need to sample 4 times.

##### Share on other sites
I reduced the texture map to 128x128 and it gave a small boost in FPS. Though the quality really shows.

##### Share on other sites
Why can't you calculate texcoord1->texcoord3 in the vertex pipeline and interpolate them through regular texture coordinate indices? Adding the same constant for each pixel is totally redundant...

I mean, hell, it looks like what you're doing there could be achieved using the fixed function pixel pipeline. Accumulate four texture reads using coords generated in the vertex shader, followed by a multiply by the interpolated diffuse value.

##### Share on other sites
My terrain currently does a similar type of blending, only the blend factors are stored per vertex instead of in a texture map. But I get the same problem nonetheless.

The problem is you are burning up fill rate. What class video card are you using? On my geForce 3 (4 simultaneous texture units) I get around 80-100 FPS when rendering 65,000 triangles of terrain @ 640x480. But the bottle neck is definitely sampling all those textures. You can test by just changing the device resolution and see if it has a big impact on performance.

The only real solution that I have come across is to just use one large color texture over the whole terrain with a detail texture to reduce the blurriness. You will need to figure out how to reduce the number of textures that you are using.

Hope this helps.

##### Share on other sites
Quote:
 Original post by superpigWhy can't you calculate texcoord1->texcoord3 in the vertex pipeline and interpolate them through regular texture coordinate indices? Adding the same constant for each pixel is totally redundant...

Yah I just did that - though it didn't really change the FPS by much more then a few FPS.

Here's my updated code.
////	The terrain's Pixel shader works on sampling from a texture map. Which is 4 textures// combined into one 512x512 texture map.//// The vertices look like this// *-*// |\|// *-*//// Tex0 is the U,V's for the texture map in the format// 0,0  1,0// 0,1  1,1//float4 TerrainPS( float4 Normal : TEXCOORD0, float2 Tex0 : TEXCOORD1,											 float2 Tex1 : TEXCOORD2,											 float2 Tex2 : TEXCOORD3,												 float2 Tex3 : TEXCOORD4,		// Tex coordinate 0											 float2 Tex4 : TEXCOORD5,		// Tex coordinate 1											 float2 Tex5 : TEXCOORD6,		// Tex coordinate 2											 float2 Tex6 : TEXCOORD7 ) : COLOR //  coordinate 3{		// Sample the texture map by the correct texture coords.	float4 texColor0 = tex2D( S0, Tex5 );	float4 texColor1 = tex2D( S0, Tex4 );	float4 texColor2 = tex2D( S0, Tex3 );	float4 texColor3 = tex2D( S0, Tex6 );		// This is the multi-texturing stage process.	//	// Tex1 and Tex2 represent the four textures	// Texture 0 == Tex1.x	// Texture 1 == Tex1.y	// Texture 2 == Tex2.x	// Texture 3 == Tex2.y	//	// They are set to 1 at the vertex if you painted a texture there	// Otherwise they are set to 0. This lets the texture fade across.	texColor0 *= Tex1.x;	texColor1 *= Tex1.y;	texColor2 *= Tex2.x;	texColor3 *= Tex2.y;		// Add them together for additive blending.	float4 texColor = texColor0 + texColor1 + texColor2 + texColor3;	// Note Normal is the diffuse component calculated in the vertex shader.	return texColor * Normal; //+ ambientMtrl * texColor;}

p.s. To jason, I'd love to have just one colored texture across the terrain but as per requirements I must be able to paint four different textures on to the terrain.

##### Share on other sites
What hardware are you testing it on?

It'd be worth trying to get a look at the assembly it generates, too. If I were writing this in assembly I'd write this:

ps.1.1tex t0tex t1tex t2tex t3add r0, t0, t1add r0, r0, t2add r0, r0, t3mul r0, r0, v0

That assumes the texture is bound to four seperate stages, of course - as far as I know, multiple reads from the same texture stage ain't a good thing.

##### Share on other sites
That assembly shader does not perform the same operations that ph33r's HLSL does, specifically there is no account for the optional blending of textures - you need mads for that.

I used fxc to generate the assembly output from ph33r's shader, and it's 4 texture instructions followed by 5 arithmatic. That's not a big requirement really. I'll have to think about it some more before I come to any conclusions.

-Mezz

##### Share on other sites
Quote:
 Original post by superpigWhat hardware are you testing it on?

Quote:
 It'd be worth trying to get a look at the assembly it generates, too.

//// Generated by Microsoft (R) D3DX9 Shader Compiler 5.04.00.2904//// Parameters:////   sampler2D Texture0;////// Registers:////   Name         Reg   Size//   ------------ ----- ----//   Texture0     s0       1//    ps_2_0    dcl t0    dcl t2.xy    dcl t3.xy    dcl t4.xy    dcl t5.xy    dcl t6.xy    dcl t7.xy    dcl_2d s0    texld r3, t5, s0    texld r2, t6, s0    texld r1, t4, s0    texld r0, t7, s0    mul r3, r3, t2.y    mad r2, r2, t2.x, r3    mad r1, r1, t3.x, r2    mad r0, r0, t3.y, r1    mul r0, r0, t0    mov oC0, r0// approximately 10 instruction slots used (4 texture, 6 arithmetic)

Quote:
 That assumes the texture is bound to four seperate stages, of course - as far as I know, multiple reads from the same texture stage ain't a good thing.

The texture is only bound to one stage. I didn't know there would be a benefit from putting one texture on multiple stages. I assumed just using one texture one stage and four reads from it would be good.

Quote:
 Original post by MezzI'll have to think about it some more before I come to any conclusions.

Thanks mezz.

##### Share on other sites
Quote:
 Original post by MezzThat assembly shader does not perform the same operations that ph33r's HLSL does, specifically there is no account for the optional blending of textures - you need mads for that.
Oh, whoops, I missed that in the original HLSL code.

I don't have the hardware to play with ps_2_0 stuff so I've not been able to test whether having the texture bound to four seperate samplers is faster than reusing one. You'd have to profile it. Ah - if you've got NVidia hardware, have you tried using NVPerfHUD?

I don't think there's anything here that actually requires ps_2_0 though.

• ### What is your GameDev Story?

In 2019 we are celebrating 20 years of GameDev.net! Share your GameDev Story with us.

• 14
• 10
• 9
• 35
• 16
• ### Forum Statistics

• Total Topics
634125
• Total Posts
3015674
×