My Terain's multi-texture pixel shader is killing me!(FPS)

Started by
17 comments, last by mikeman 19 years, 3 months ago
You know, you're currently burning 7 texture coordinate sets to do that. You can take advantage of the fact that texcoords are float4. You can sample the first texture with (tex0.xy), the second with (tex0.zw), and so on. And you can store all the blending factors into just one texcoord. Not sure if this will increase performance, but it might. And try to bind to 4 different stages. It may enable better pipelining.
Advertisement
Mikeman:

I should bind the same texture to four different stages? Or just use 4 texturse on 4 stages instead of a texture map.
Quote:Original post by superpig
Oh, whoops, I missed that in the original HLSL code.


Hey no probs, I did myself when I originally looked at it - it was only when I copy & pasted it into a file and ran fxc and I was like "why is this generating mads?" that I went back and looked at it properly.

Quote:Original post by superpig
I don't think there's anything here that actually requires ps_2_0 though.


I think I'd agree, apart from possibly running out of texture stages (because you'd need to use the interpolators for those texcoords which tell you whether or not the texture is active on that vertex or not).

I still can't think why this shader would be particularly slow, you can try binding the texture to different stages... it might help it go through some other silicon. You could also try interlacing the texture/arithmatic instructions to hide latency, but I'd have thought the internal compiler would've done that for you *shrug*.

I can't really think of much else, if you're on NV hardware, you could try PerfHud as superpig suggested.

-Mezz
If you keep the texture sampling but remove the math part of the shader, does performance increase? In that case you're probably tex bandwidth bound rather than actually fill rate bound. If that is the case, make sure mip mapping is on and try texture compression and lowering texture resolution to improve performance.
Couldn't he test that by switching temporarily to 1x1 sixed textures too? I think I heard that's how NVPerfHUD measures things...

Richard "Superpig" Fine - saving pigs from untimely fates - Microsoft DirectX MVP 2006/2007/2008/2009
"Shaders are not meant to do everything. Of course you can try to use it for everything, but it's like playing football using cabbage." - MickeyMouse

Sure, that would work. Except if he's bound by the actual lookup instructions and not bandwidth of course.
First of all don't use PS_2_0 for such things. PS_1_1/PS_1_3 is enough.
Second, you can put all the multiplier constants into ONE texcoord.

float4 texColor0 = tex2D( S0, Tex1 );
float4 texColor1 = tex2D( S0, Tex2 );
float4 texColor2 = tex2D( S0, Tex3 );
float4 texColor3 = tex2D( S0, Tex4 );

texColor0 *= Tex5.x;
texColor1 *= Tex5.y;
texColor2 *= Tex5.z;
texColor3 *= Tex5.w;

.. Etc.

Last but not least, you CAN write this pixel shader with the fixed function pipeline using some smart evaluation.
Since the multipliers are monochrome floats you can store them per vertex at the diffuse / specular components.

However IMO if the PS_1_3/PS_1_1 version is still slow the bottleneck is just sampling the textures, if reducing texture size does not work, and removing the blending multipliers (Tex5) does not help as well, your card may be too slow, or just update your drivers.
Basically sounds like the farcry method of texturing terrain. My renderer divides the terrain into chunks for LOD reasons. What i did to optimize it was: Check if a chunk really needs the 4 texture units, if it doesnt, user a simpler pixel shader that can only blend 2 texunits for example ( farcry also does that ).
Another thing that is a potential performance burner is the "intuitive" rendering order that would be like:

for each chunk
{
Render base texture
Enable Shader
Render multitextured detail
Disable Shader
}

These kind of shader changes are extremely expensive, try to render all base textures first, then ( if you use the method i described above) render all chunks that use 4 units, then 3, 2, 1... you get the clue...

Hope it helps

Edit: Oh btw what i forgot: you should also restrict the detail texturing distance in some way, its usually not worth detail texturing anything because it will eat lots of fillrate without much of a visual change.

Another thing that may help you is to keep the size of your textures constant while enabling DXT compression. Alternatively you could play around with the filtering mode ( altough this one definately worsens your visual quality while the quality loss through compression is pretty neglectable for detail textures )

this is my code if your curious... its GLSL... ( uses the RGBA as intensities for the different textures )
[VP]varying vec2 texCoord0;uniform vec3 eyePos;uniform float lodBias;void main(){    texCoord0=gl_MultiTexCoord0.xy*128.0;		    vec3 diff=eyePos-gl_Vertex.xyz;    float inten1=1.0-clamp(dot(diff,diff)/(15000.0*lodBias),0.0,1.0);    // this serves to create a fade out effect at the lod edges    gl_FrontColor.rgba = gl_Color.rgba*inten1;    gl_Position = ftransform();}[FP]uniform sampler2D detMap0;uniform sampler2D detMap1;uniform sampler2D detMap2;uniform sampler2D detMap3;varying vec2 texCoord0;void main () {     vec4 tempcolor=vec4(0.5)-0.5*gl_Color ;      vec3 detColor0 = texture2D(detMap0, texCoord0).xyz;      detColor0 = detColor0*gl_Color.r + tempcolor.r;      vec3 detColor1 = texture2D(detMap1, texCoord0).xyz;      detColor1 = detColor1*gl_Color.g + tempcolor.g;      vec3 detColor2 = texture2D(detMap2, texCoord0).xyz;      detColor2 = detColor2*gl_Color.b + tempcolor.b;      vec3 detColor3 = texture2D(detMap3, texCoord0).xyz;      detColor3 = detColor3*gl_Color.a + tempcolor.a;        gl_FragColor=vec4(detColor0 * detColor1 * detColor2 * detColor3 * 8.0,1.0); }  


[Edited by - Dtag on January 10, 2005 9:23:03 AM]
Something that hasn't been mentioned yet, are you taking any measures to reduce overdraw? Since the shader is expensive anyway, it is essential. Maybe you should work on this area too. I mean, it's not only how long the shader is, it's also how many times it gets executed. I suppose you have at least backface culling on, right?

This topic is closed to new replies.

Advertisement