• Advertisement
Sign in to follow this  

HLSL Texture Splatting Question

This topic is 2985 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

So I have this pixel shader for texture splatting a "patch" of terrain using up to 4 blend maps and 12 "splat" textures (layer textures). I wrote a "super" shader that can handle the maximum case: 4 blend map samples and 12 splat samples = 16 texture samples per pixel (and close to 64 arithmetic operations per-pixel too). This brings me to the edge of ps_2_0 (and even in ps_3_0, you can't exceed 16 texture samples per pixel). My question is whether a "super-shader" that uses global 0.0f/1.0f floats to eliminate the blends/splats that aren't used is more optimal than unrolling this into 12 different pixel shaders. The art resources I'm using specify the number of splat textures and blend maps on a small per-patch basis, and not every patch uses the maximum (4/12) amount, some use considerably less. Here is the "super" splat pixel shader:
float4 DAOCSplatTerrainPS( float2 tiledTexC     : TEXCOORD0, 
                           float2 nonTiledTexC  : TEXCOORD1,
                           float  shade         : TEXCOORD2,
                           float  fogLerpParam  : TEXCOORD3) : COLOR
	// Layer maps are tiled
    float3 c0  = tex2D(SplatTex0S,  tiledTexC * gTexScale[ 0 ]).rgb;
    float3 c1  = tex2D(SplatTex1S,  tiledTexC * gTexScale[ 1 ]).rgb * g_hasSplat[ 1 ];
    float3 c2  = tex2D(SplatTex2S,  tiledTexC * gTexScale[ 2 ]).rgb * g_hasSplat[ 2 ];

    float3 c3  = tex2D(SplatTex3S,  tiledTexC * gTexScale[ 3 ]).rgb * g_hasSplat[ 3 ];
    float3 c4  = tex2D(SplatTex4S,  tiledTexC * gTexScale[ 4 ]).rgb * g_hasSplat[ 4 ];
    float3 c5  = tex2D(SplatTex5S,  tiledTexC * gTexScale[ 5 ]).rgb * g_hasSplat[ 5 ];

    float3 c6  = tex2D(SplatTex6S,  tiledTexC * gTexScale[ 6 ]).rgb * g_hasSplat[ 6 ];
    float3 c7  = tex2D(SplatTex7S,  tiledTexC * gTexScale[ 7 ]).rgb * g_hasSplat[ 7 ];
    float3 c8  = tex2D(SplatTex8S,  tiledTexC * gTexScale[ 8 ]).rgb * g_hasSplat[ 8 ];

    float3 c9  = tex2D(SplatTex9S,  tiledTexC * gTexScale[ 9 ]).rgb * g_hasSplat[ 9 ];
    float3 c10 = tex2D(SplatTex10S, tiledTexC * gTexScale[ 10 ]).rgb * g_hasSplat[ 10 ];
    float3 c11 = tex2D(SplatTex11S, tiledTexC * gTexScale[ 11 ]).rgb * g_hasSplat[ 11 ];
    // Blendmaps are not tiled.
    float3 B0 = tex2D(BlendMap0S, nonTiledTexC).rgb;
    float3 B1 = (tex2D(BlendMap1S, nonTiledTexC).rgb) * g_hasBlend[ 1 ];
    float3 B2 = (tex2D(BlendMap2S, nonTiledTexC).rgb) * g_hasBlend[ 2 ];
    float3 B3 = (tex2D(BlendMap3S, nonTiledTexC).rgb) * g_hasBlend[ 3 ];
	// INVSRC ALPHA blend + fog + light
	float3 color = (c0 * shade);
	       color = (B0.g * c1) + (1 - B0.g) * color;
	       color = (B0.b * c2) + (1 - B0.b) * color;

	       color = (B1.r * c3) + (1 - B1.r) * color;
	       color = (B1.g * c4) + (1 - B1.g) * color;
	       color = (B1.b * c5) + (1 - B1.b) * color;
	       color = (B2.r * c6) + (1 - B2.r) * color;
	       color = (B2.g * c7) + (1 - B2.g) * color;
	       color = (B2.b * c8) + (1 - B2.b) * color;

	       color = (B3.r * c9) + (1 - B3.r) * color;
	       color = (B3.g * c10) + (1 - B3.g) * color;
	       color = (B3.b * c11) + (1 - B3.b) * color;
    return (lerp(float4(color, 1.0f), gFogColor, fogLerpParam));

And here is an example of the splatting in action: Photobucket So what do you guys think, is it better to use the super shader or to cut-n-paste out 12 shaders that handle the different combos and put them in something like this?
PixelShader psSplatArray20[ MAX_SPLATS ] = { compile ps_2_0 DAOCSplat1TerrainPS(),
                                             compile ps_2_0 DAOCSplat2TerrainPS(),
                                             compile ps_2_0 DAOCSplat3TerrainPS(),
                                             compile ps_2_0 DAOCSplat4TerrainPS(),
				compile ps_2_0 DAOCSplat5TerrainPS(),
				compile ps_2_0 DAOCSplat6TerrainPS(),
				compile ps_2_0 DAOCSplat7TerrainPS(),
				compile ps_2_0 DAOCSplat8TerrainPS(),
				compile ps_2_0 DAOCSplat9TerrainPS(),
				compile ps_2_0 DAOCSplat10TerrainPS(),
				compile ps_2_0 DAOCSplat11TerrainPS(),
				compile ps_2_0 DAOCSplat12TerrainPS()

Share this post

Link to post
Share on other sites
Are you sure those are the best ways to approach this (can you even have 16 textures?)? Personally, if you definitely want to stick to these methods, I'd say at least switch to PS3.0 for dynamic branching (only sample if alpha>0.01) and use the super shader.

I'd say the best is to separate your terrain into small chunks. For each chunk, you can only have 3-4 splat textures. It's highly unlikely you'd have more than 4 different terrain types in one chunk.

Share this post

Link to post
Share on other sites
Thanks for the reply. I didn't make the terrain or the blend maps and they *DO* in fact have some situations where they layer up to 12 different "ground types" on a patch (with 4 RGB blend maps). Plus I'd like to keep the whole thing in PS_2_0 and any kind of branching in a pixel shader is SLOOOOW (try it yourself - it kills parallelism).

Main question is just this:

Is there any hidden penalty in creating an array of pixelshaders and selecting one for a batch of triangles via a uniform extern? I know the worst case 12 textures, 4 blend maps is seldomly used (though it is used) and it would be better to use the number of blends and splats that the patch demands, but am I going to kill performance by selecting it from a pixel shader array?

My current understanding is that the hardware can do the 16 texture samples in parallel (and does so anyway whether you ask to sample or not).

Share this post

Link to post
Share on other sites
If you want to stay with ps_2_0 then using a few different shaders is the only way. Note that you do not need a different shader for every possible combination. Maybe one unused texture lookup here and there can be ok if it helps to reduce the number of differnt shaders used. And of course group by shader when drawing.

The problem with ps_2_0 is, that you can't have conditional texture lookups. So the 16 lookups will happen for every pixel in your frame and many will just drop the lookup results. But the performance cost for the 16 lookups are always there. So either you have to use a higher shader model (where lookups can be conditional) or you need to split into different shaders.

And of course making 16 texture lookups takes a lot longer than doing just 3.

Share this post

Link to post
Share on other sites
Thanks for the replies.

Still confused about the texture samplers. If I recall correctly, the directX fixed-function would sample from 16 samplers and combine them for each pixel regardless of how many textures you had set and reading some docs it seemed to do that in parallel.

Is it not likely that the same thing is happening here? Maybe there is no cost for sampling from 16 textures in a single pass, but selecting different pixel shaders from an array may invoke some hidden overhead?

Any clarification would be appreciated.

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement