High performance texture splatting?

Started by
6 comments, last by sth 11 years, 10 months ago
Is there a way to improve the performance of texture reads in OpenGL (ES)?
I'd like to do texture splatting but I'm running into severe performance problems when using multiple texture reads in my fragment shader. With a single texture, I get a constant 60fps (vsync limited, with room to spare). Adding a second texture read, my performance drops down to 40fps and with the shader provided below, performance is down to 15fps.

Any ideas?

[source lang="plain"]# Simplified fragment shader for demonstration purposes

varying lowp vec2 v_texcoord;
varying lowp vec4 v_color;

uniform sampler2D texture0;
uniform sampler2D texture1;
uniform sampler2D texture2;
uniform sampler2D texture3;
uniform sampler2D texture4;


void main()
{
lowp vec4 alpha = texture2D(texture4, v_texcoord);

lowp vec4 color = texture2D(texture0, v_texcoord);
color = mix(color, texture2D(texture1, v_texcoord), alpha[0]);
color = mix(color, texture2D(texture2, v_texcoord), alpha[1]);
color = mix(color, texture2D(texture3, v_texcoord), alpha[2]);

gl_FragColor = v_color * color;
}[/source]
Advertisement
If you throw out the vsync sample, your frame times are 40Hz/25ms for 2 samples, and 15Hz/66.6..ms for 5 samples. Divide the frame times by the number of samples and you get 12.5ms and 13.3..ms, which are pretty similar.
From that, it looks like you're just using more texture bandwidth than your GPU can cope with... draw less pixels using these shaders -- use distance material-LOD maybe?

You can also reduce data in many ways, such as packing 4 different monochrome textures together along with 4 uniform colours.
Perhaps try avoiding the mix and optimize the code manually:


void main()
{
lowp vec4 alpha = texture2D(texture4, v_texcoord);

lowp vec4 color0 = texture2D(texture0, v_texcoord);
lowp vec4 color1 = texture2D(texture1, v_texcoord);
lowp vec4 color2 = texture2D(texture2, v_texcoord);
lowp vec4 color3 = texture2D(texture3, v_texcoord);

gl_FragColor = v_color * (alpha[0] * color0 + alpha[1] * color1 + alpha[2] * color2 + alpha[3] * color3);
}


(I reindexed the way how the indices of the alpha vector affect the read color texture for straightforwardness). The idea is that alpha[0] is already precomputed to be 1.0f - alpha[1] - alpha[2] - alpha[3] in the texture, so one doesn't need to compute that in the shader. I feel this would be faster than using mix(), but can't be sure without profiling. Let me know how it compares.

Something that's potentially optimizable is to drop one or two texture channels to splat, and subdivide your mesh down by which splat textures it is using at each triangle. Also, if the splat texture is low-frequency, try storing the splat weights as vertex attributes and pass them through to pixel shader, which will avoid you one texture read.

Finally, if the splat texture is very low frequency, you can try just decaling the contents, i.e. manually generate geometry planes that you alphablend on top of the terrain.
Also, Why not try just 3 splat channels instead of 4. 3 should be sufficient especially if it is for a small screen embedded device.

NBA2K, Madden, Maneater, Killing Floor, Sims http://www.pawlowskipinball.com/pinballeternal


If you throw out the vsync sample, your frame times are 40Hz/25ms for 2 samples, and 15Hz/66.6..ms for 5 samples. Divide the frame times by the number of samples and you get 12.5ms and 13.3..ms, which are pretty similar.
From that, it looks like you're just using more texture bandwidth than your GPU can cope with...

I think you're right, it looks pretty much bandwidth-limited right now.
Today I realized that I still had trilinear filtering enabled. Disabling it cut the frame times in half. I can't recall trilinear filtering ever having such a profound effect in any of my projects, but I guess that's just what happens when you're running at the limit.

Anyway, at least I'm now back to playable framerates.


Perhaps try avoiding the mix and optimize the code manually:

Thanks for the suggestion. I tried it but it didn't have any effect on the performance.


Also, Why not try just 3 splat channels instead of 4. 3 should be sufficient especially if it is for a small screen embedded device.

I'm considering it, but I'm still not sure if I really want it.
The thing is that these days it's not just phone screens, it's also tablet screens and (wireless) TV output.
The thing is that these days it's not just phone screens, it's also tablet screens and (wireless) TV output.[/quote]
Well the thing really is, you don't have enough power currently on GLES. If you want 20 different types of splats, you could do that as well, just divide your terrain into sections and only allow 1 section 3 splats (rgb). Each section can have any 3 of the 20 splats.

NBA2K, Madden, Maneater, Killing Floor, Sims http://www.pawlowskipinball.com/pinballeternal

I think you're right, it looks pretty much bandwidth-limited right now.Today I realized that I still had trilinear filtering enabled. Disabling it cut the frame times in half. I can't recall trilinear filtering ever having such a profound effect in any of my projects, but I guess that's just what happens when you're running at the limit.
Yeah, trilinear filtering will double the amount of data that each pixel has to pull into the texture cache, so if you're already texture-bound, it going to make things a lot worse.

Well the thing really is, you don't have enough power currently on GLES. If you want 20 different types of splats, you could do that as well, just divide your terrain into sections and only allow 1 section 3 splats (rgb). Each section can have any 3 of the 20 splats.
Yep, I've worked on a few games where each triangle could only have 2 layers on it, but the entire terrain could have many, many layers. We'd use a tool to split the terrain up into sections, where each section only had 2 texture layers. As long as you never needed 3 or more materials present at a single vertex, it worked ok.

The thing is that these days it's not just phone screens, it's also tablet screens and (wireless) TV output.

Well the thing really is, you don't have enough power currently on GLES. If you want 20 different types of splats, you could do that as well, just divide your terrain into sections and only allow 1 section 3 splats (rgb). Each section can have any 3 of the 20 splats.
[/quote]
I'll have to see. My current implementation already allows every chunk of the terrain to have its own set of textures, although it would be nicer not to be bound to the fixed chunk structure.

This topic is closed to new replies.

Advertisement