Sign in to follow this  
Nyrr

[XNA, HLSL] Unusual multitexturing performance drop

Recommended Posts

Hi there :) I've been working on creating a terrain system for an XNA 3d game project. It's been working quite nicely - until I implemented texture splatting (using the RGB channels of one texture to mix three different textures on the terrain). Now, when the terrain is on the screen, there is a massive FPS drop (to 6-8 from around 60), *unless* I zoom in very close - close enough to only have a few triangles on the screen. The textures used are of reasonable size (512x512), and this exact pixel shader worked perfectly when I implemented it in DirectX. The shader is based on an example from the book "Beginning DirectX 9.0c Shader Approach". I really can't see any huge problems with the shader, and it *does* seem to be the shader. When I render the terrain with a basic effect and a single texture, it works fine. I'm quite stumped here, so I'd appreciate any tips and help. Shout if you need more code. Thanks :)
float4x4 World;
float4x4 View;
float4x4 Projection;

texture gTex0;
texture gTex1;
texture gTex2;
texture gTex3;
texture gTex4;
texture gBlendMap;

float TIMES_TILE = 32.0f;

// Use Anisotropic filtering since when we are low to the ground, the 
// ground plane is near a 90 degree angle with our view direction.
sampler Tex0S = sampler_state
{
	Texture = <gTex0>;
	MinFilter = Anisotropic;
	MagFilter = LINEAR;
	MipFilter = LINEAR;
	MaxAnisotropy = 8;
	AddressU  = WRAP;
    AddressV  = WRAP;
};

sampler Tex1S = sampler_state
{
	Texture = <gTex1>;
	MinFilter = Anisotropic;
	MagFilter = LINEAR;
	MipFilter = LINEAR;
	MaxAnisotropy = 8;
	AddressU  = WRAP;
    AddressV  = WRAP;
};

sampler Tex2S = sampler_state
{
	Texture = <gTex2>;
	MinFilter = Anisotropic;
	MagFilter = LINEAR;
	MipFilter = LINEAR;
	MaxAnisotropy = 8;
	AddressU  = WRAP;
    AddressV  = WRAP;
};

sampler Tex3S = sampler_state
{
	Texture = <gTex3>;
	MinFilter = Anisotropic;
	MagFilter = LINEAR;
	MipFilter = LINEAR;
	MaxAnisotropy = 8;
	AddressU  = WRAP;
    AddressV  = WRAP;
};

sampler Tex4S = sampler_state
{
	Texture = <gTex4>;
	MinFilter = Anisotropic;
	MagFilter = LINEAR;
	MipFilter = LINEAR;
	MaxAnisotropy = 8;
	AddressU  = WRAP;
    AddressV  = WRAP;
};
 
sampler BlendMapS = sampler_state
{
	Texture = <gBlendMap>;
	MinFilter = LINEAR;
	MagFilter = LINEAR;
	MipFilter = LINEAR;
	AddressU  = CLAMP;
    AddressV  = CLAMP;
};

struct VertexOut
{
    float4 Pos : POSITION;
    float4 Color : COLOR;
    float2 tiledTexC    : TEXCOORD0;
    float2 nonTiledTexC : TEXCOORD1;

};

VertexOut VShader(
float4 Pos : POSITION,
float2 Tex : TEXCOORD0

)
{
    VertexOut Vert;
    float4x4 Transform;

    Transform = mul(World, View);
    Transform = mul(Transform, Projection);
    Vert.Pos = mul(Pos, Transform);

    Vert.Color = float4(1, 1, 1, 1);
    Vert.tiledTexC = Tex * TIMES_TILE;
    Vert.nonTiledTexC = Tex;

    return Vert;
}

float4 TerrainMultiTexPS(
                         float2 tiledTexC : TEXCOORD0,
                         float2 nonTiledTexC : TEXCOORD1) : COLOR
{
      // Layer maps are tiled
      float3 c1 = tex2D(Tex1S, tiledTexC).rgb;
      float3 c2 = tex2D(Tex2S, tiledTexC).rgb;
      float3 c3 = tex2D(Tex3S, tiledTexC).rgb;

      // Blend map is not tiled.
      float4 B = tex2D(BlendMapS, nonTiledTexC).rgba;

      // Find the inverse of all the blend weights so that we can
      // scale the total color to the range [0, 1].
      float totalInverse = 1.0f / (B.r + B.g + B.b);

      // Scale the colors by each layer by its corresponding weight
      // stored in the blend map.
      c1 *= B.r * totalInverse;
      c2 *= B.g * totalInverse;
      c3 *= B.b * totalInverse;
      
      
      

      // Sum the colors and modulate with the lighting color.
      float3 final = (c1 + c2 + c3 );

      return float4(final, 1.0f);
}



technique FirstTechnique
{
    pass FirstPass
    {
        Lighting = FALSE;
        ZEnable = TRUE;

        VertexShader = compile vs_2_0 VShader();
        pixelShader = compile ps_2_0 TerrainMultiTexPS();
    }
}

Share this post


Link to post
Share on other sites
I'm not sure what is making it run slow but just as a note for when you get it working it's possible to use a 32bit texture to control 5 different textures.

Use the R, G, B and A layers as usual then take that total away from 1:


float4 B = tex2D(BlendMapS, nonTiledTexC).rgba;
float remainder = 1 - (B.r + B.g + B.b + B.a);



That way you can use the r, g, b, a and remainder for the 5th texture. In the blend texture anywhere that is black in both the colour and alpha layers will be the remainder texture then.

Share this post


Link to post
Share on other sites
That method was actually something I wanted to implement next, after I got this working properly :P But now there's no point until the huge massive performance drop is handled somehow. It does seem to be XNA-specific.

Share this post


Link to post
Share on other sites
Some thoughts:

- Have you tried using PerfHud to profile this?

- Doing a PIX capture might also help work out why it's so slow.

- Are the D3D debug runtimes off when testing the performance?

- DXT1 compressing the three textures you're combining should produce some speedup (and save memory), as long as the GPU is the bottleneck.

Share this post


Link to post
Share on other sites
Obviously there could be a variety of reasons, but it might be worth checking your CPU code to make sure you're not setting any variables in the HLSL code that don't exist. This happened to me once whilst prototyping some code, I removed some global variables from my shader but left in some code that sets them in my app. Switching on the DX debug runtime will display any errors that are occuring in the driver (in your debug output).

Share this post


Link to post
Share on other sites
Thanks for the suggestions! I've been trying to get rid of the problem for the past few days with no success.

Adam_42, I've used PerfHUD, and found that the Present function run right at the end of the frame takes very long, about 90-100 ms! I then manually added a GraphicsDevice.Present() to the end of my main Draw function. This dropped its execution time in PerfHUD to about 0.1ms and doubled the framerate. It's still really low - around 14 FPS for a simple multitextured 32x32 piece of terrain.

The performance is also really strange - there are spikes in the frame rendering time. When I move around the terrain, rather that having a consistent low framerate, I get slides and stutters.

I'm no longer sure that the pixel shader is causing problems - after all, it worked perfectly in a C++ application. The vertex and index buffers of the terrain might have something to do with it. They're write only (and were in C++ as well), maybe that causes problems?

m_VertexBuffer = new VertexBuffer(((Game1)Game).device, m_Vertices.Count * VertexPositionNormalTexture.SizeInBytes, BufferUsage.WriteOnly);

m_IndexBuffer = new IndexBuffer(((Game1)Game).device, typeof(int), indices.Length, BufferUsage.WriteOnly);



Debug runtimes were off when testing performance... I'll definitely try a PIX capture soon.


RobMaddison - I switched to full debug mode and double checked the code, but found nothing of the sorts. Good advice though, thanks.

For reference, this is the Draw function of the vertex handler for the terrain...


public override void Draw(GameTime gameTime)
{

if (m_Polygons == 0)
return;


Game.GraphicsDevice.VertexDeclaration = new VertexDeclaration(((Game1)Game).graphics.GraphicsDevice, VertexPositionNormalTexture.VertexElements);
((Game1)Game).device.VertexDeclaration = new VertexDeclaration(((Game1)Game).graphics.GraphicsDevice, VertexPositionNormalTexture.VertexElements);

if (m_UseBasicEffect == true)
{
m_BasicEffect.TextureEnabled = true;
m_BasicEffect.Texture = m_Textures[0];

m_BasicEffect.World = m_WorldMat;
m_BasicEffect.View = ((Game1)Game).m_ActiveCamera.GetViewMat();
m_BasicEffect.Projection = ((Game1)Game).m_ActiveCamera.GetPerspective();

m_BasicEffect.Begin();

foreach (EffectPass pass in m_BasicEffect.CurrentTechnique.Passes)
{
pass.Begin();
// Call draw user primitive here

((Game1)Game).device.Indices = m_IndexBuffer;
((Game1)Game).device.Vertices[0].SetSource(m_VertexBuffer, 0, VertexPositionNormalTexture.SizeInBytes);

((Game1)Game).device.DrawIndexedPrimitives(PrimitiveType.TriangleList, 0, 0, m_Vertices.Count, 0, m_Polygons);


pass.End();
}

m_BasicEffect.End();
}

else
{

m_Effect.CurrentTechnique = m_Effect.Techniques["FirstTechnique"];
//m_Effect.Parameters["gTex0"].SetValue(m_Textures[0]);
m_Effect.Parameters["gTex1"].SetValue(m_Textures[1]);
m_Effect.Parameters["gTex2"].SetValue(m_Textures[2]);
m_Effect.Parameters["gTex3"].SetValue(m_Textures[3]);
//m_Effect.Parameters["gTex4"].SetValue(m_Textures[4]);
m_Effect.Parameters["gBlendMap"].SetValue(m_Textures[5]);

m_Effect.Parameters["World"].SetValue(m_WorldMat);
m_Effect.Parameters["View"].SetValue(((Game1)Game).m_ActiveCamera.GetViewMat());
m_Effect.Parameters["Projection"].SetValue(((Game1)Game).m_ActiveCamera.GetPerspective());

m_Effect.Begin();

foreach (EffectPass pass in m_Effect.CurrentTechnique.Passes)
{
pass.Begin();
// Call draw user primitive here

((Game1)Game).device.Indices = m_IndexBuffer;
((Game1)Game).device.Vertices[0].SetSource(m_VertexBuffer, 0, VertexPositionNormalTexture.SizeInBytes);
((Game1)Game).device.DrawIndexedPrimitives(PrimitiveType.TriangleList, 0, 0, m_Vertices.Count, 0, m_Polygons);

pass.End();
}

m_Effect.End();
}
base.Draw(gameTime);
}







This is really driving me crazy - it's such a simple piece of code! Any input appreciated :)


EDIT: It's definitely the GPU and the pixel shader. It's taking massive amounts of time according to the PerfHUD graphs.

Share this post


Link to post
Share on other sites
Sorry about the double post, but I've found something very very strange in the pixel shader.

If I render with just two textures blended rather than three, it works perfectly.

That is, if I do this in the pixel shader:


return float4(c0 + c1, 1.0f);


rather than


return float4(c0 + c1 + c2, 1.0f);


it works smoothly. It doesn't matter which combination of textures I use. I'm starting to suspect it's related to my own drivers, since some of my friends are running it with no performance issues...

Share this post


Link to post
Share on other sites
Do you need anisotropic filtering on your terrain? Especially when combined with splatting, that results in a pretty heavy texture load on the shader. I also second the suggestion to compress the textures, which can give performance back by reducing bits per pixel read from memory.

Secondly, large meshes like terrain can be rendered a lot more efficiently when broken up onto blocks of about 1000 polygons each. Draw calls of that size don't hold up the CPU for as long while the GPU processes the command. You can share a small index buffer between blocks, saving memory and staying withing the 16 bit per index mark for even really large terrain. A more advanced terrain system could also perform visibility detection on each block.

Don't create a new vertex declaration every frame. They are a graphics resource like textures and meshes, which need to be set once and disposed when no longer used. Right now, you are relying on the garbage collector to eventually clean up thousands of vertex declaration objects, which impacts performance and memory.

Finally, you should save the effect parameters locally when loading the shader, rather than doing costly string comparisons every single frame. That's fine for prototyping, but working with strings can get you in trouble, especially if you plan to run on the Xbox 360.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this