Sign in to follow this  
sth

OpenGL High performance texture splatting?

Recommended Posts

sth    235
Is there a way to improve the performance of texture reads in OpenGL (ES)?
I'd like to do texture splatting but I'm running into severe performance problems when using multiple texture reads in my fragment shader. With a single texture, I get a constant 60fps (vsync limited, with room to spare). Adding a second texture read, my performance drops down to 40fps and with the shader provided below, performance is down to 15fps.

Any ideas?

[source lang="plain"]# Simplified fragment shader for demonstration purposes

varying lowp vec2 v_texcoord;
varying lowp vec4 v_color;

uniform sampler2D texture0;
uniform sampler2D texture1;
uniform sampler2D texture2;
uniform sampler2D texture3;
uniform sampler2D texture4;


void main()
{
lowp vec4 alpha = texture2D(texture4, v_texcoord);

lowp vec4 color = texture2D(texture0, v_texcoord);
color = mix(color, texture2D(texture1, v_texcoord), alpha[0]);
color = mix(color, texture2D(texture2, v_texcoord), alpha[1]);
color = mix(color, texture2D(texture3, v_texcoord), alpha[2]);

gl_FragColor = v_color * color;
}[/source] Edited by sth

Share this post


Link to post
Share on other sites
Hodgman    51334
If you throw out the vsync sample, your frame times are 40Hz/25ms for 2 samples, and 15Hz/66.6..ms for 5 samples. Divide the frame times by the number of samples and you get 12.5ms and 13.3..ms, which are pretty similar.
From that, it looks like you're just using more texture bandwidth than your GPU can cope with... draw less pixels using these shaders -- use distance material-LOD maybe?

You can also reduce data in many ways, such as packing 4 different monochrome textures together along with 4 uniform colours.

Share this post


Link to post
Share on other sites
clb    2147
Perhaps try avoiding the mix and optimize the code manually:

[code]
void main()
{
lowp vec4 alpha = texture2D(texture4, v_texcoord);

lowp vec4 color0 = texture2D(texture0, v_texcoord);
lowp vec4 color1 = texture2D(texture1, v_texcoord);
lowp vec4 color2 = texture2D(texture2, v_texcoord);
lowp vec4 color3 = texture2D(texture3, v_texcoord);

gl_FragColor = v_color * (alpha[0] * color0 + alpha[1] * color1 + alpha[2] * color2 + alpha[3] * color3);
}[/code]

(I reindexed the way how the indices of the alpha vector affect the read color texture for straightforwardness). The idea is that alpha[0] is already precomputed to be 1.0f - alpha[1] - alpha[2] - alpha[3] in the texture, so one doesn't need to compute that in the shader. I feel this would be faster than using mix(), but can't be sure without profiling. Let me know how it compares.

Something that's potentially optimizable is to drop one or two texture channels to splat, and subdivide your mesh down by which splat textures it is using at each triangle. Also, if the splat texture is low-frequency, try storing the splat weights as vertex attributes and pass them through to pixel shader, which will avoid you one texture read.

Finally, if the splat texture is very low frequency, you can try just decaling the contents, i.e. manually generate geometry planes that you alphablend on top of the terrain.

Share this post


Link to post
Share on other sites
sth    235
[quote name='Hodgman' timestamp='1340028032' post='4950245']
If you throw out the vsync sample, your frame times are 40Hz/25ms for 2 samples, and 15Hz/66.6..ms for 5 samples. Divide the frame times by the number of samples and you get 12.5ms and 13.3..ms, which are pretty similar.
From that, it looks like you're just using more texture bandwidth than your GPU can cope with...[/quote]
I think you're right, it looks pretty much bandwidth-limited right now.
Today I realized that I still had trilinear filtering enabled. Disabling it cut the frame times [i]in half[/i]. I can't recall trilinear filtering ever having such a profound effect in any of my projects, but I guess that's just what happens when you're running at the limit.

Anyway, at least I'm now back to playable framerates.

[quote name='clb' timestamp='1340028652' post='4950252']
Perhaps try avoiding the mix and optimize the code manually:
[/quote]
Thanks for the suggestion. I tried it but it didn't have any effect on the performance.

[quote name='dpadam450' timestamp='1340044660' post='4950331']
Also, Why not try just 3 splat channels instead of 4. 3 should be sufficient especially if it is for a small screen embedded device.
[/quote]
I'm considering it, but I'm still not sure if I really want it.
The thing is that these days it's not just phone screens, it's also tablet screens and (wireless) TV output.

Share this post


Link to post
Share on other sites
dpadam450    2357
[quote]The thing is that these days it's not just phone screens, it's also tablet screens and (wireless) TV output.[/quote]
Well the thing really is, you don't have enough power currently on GLES. If you want 20 different types of splats, you could do that as well, just divide your terrain into sections and only allow 1 section 3 splats (rgb). Each section can have any 3 of the 20 splats.

Share this post


Link to post
Share on other sites
Hodgman    51334
[quote name='sth' timestamp='1340061761' post='4950422']I think you're right, it looks pretty much bandwidth-limited right now.Today I realized that I still had trilinear filtering enabled. Disabling it cut the frame times in half. I can't recall trilinear filtering ever having such a profound effect in any of my projects, but I guess that's just what happens when you're running at the limit.[/quote]Yeah, trilinear filtering will double the amount of data that each pixel has to pull into the texture cache, so if you're already texture-bound, it going to make things a lot worse.[quote name='dpadam450' timestamp='1340064858' post='4950429']
Well the thing really is, you don't have enough power currently on GLES. If you want 20 different types of splats, you could do that as well, just divide your terrain into sections and only allow 1 section 3 splats (rgb). Each section can have any 3 of the 20 splats.[/quote]Yep, I've worked on a few games where each triangle could only have 2 layers on it, but the entire terrain could have many, many layers. We'd use a tool to split the terrain up into sections, where each section only had 2 texture layers. As long as you never needed 3 or more materials present at a single vertex, it worked ok.

Share this post


Link to post
Share on other sites
sth    235
[quote name='dpadam450' timestamp='1340064858' post='4950429']
[quote]The thing is that these days it's not just phone screens, it's also tablet screens and (wireless) TV output.[/quote]
Well the thing really is, you don't have enough power currently on GLES. If you want 20 different types of splats, you could do that as well, just divide your terrain into sections and only allow 1 section 3 splats (rgb). Each section can have any 3 of the 20 splats.
[/quote]
I'll have to see. My current implementation already allows every chunk of the terrain to have its own set of textures, although it would be nicer not to be bound to the fixed chunk structure.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Similar Content

    • By pseudomarvin
      I assumed that if a shader is computationally expensive then the execution is just slower. But running the following GLSL FS instead just crashes
      void main() { float x = 0; float y = 0; int sum = 0; for (float x = 0; x < 10; x += 0.00005) { for (float y = 0; y < 10; y += 0.00005) { sum++; } } fragColor = vec4(1, 1, 1 , 1.0); } with unhandled exception in nvoglv32.dll. Are there any hard limits on the number of steps/time that a shader can take before it is shut down? I was thinking about implementing some time intensive computation in shaders where it would take on the order of seconds to compute a frame, is that possible? Thanks.
    • By Arulbabu Donbosco
      There are studios selling applications which is just copying any 3Dgraphic content and regenerating into another new window. especially for CAVE Virtual reality experience. so that the user opens REvite or CAD or any other 3D applications and opens a model. then when the user selects the rendered window the VR application copies the 3D model information from the OpenGL window. 
      I got the clue that the VR application replaces the windows opengl32.dll file. how this is possible ... how can we copy the 3d content from the current OpenGL window.
      anyone, please help me .. how to go further... to create an application like VR CAVE. 
       
      Thanks
    • By cebugdev
      hi all,

      i am trying to build an OpenGL 2D GUI system, (yeah yeah, i know i should not be re inventing the wheel, but this is for educational and some other purpose only),
      i have built GUI system before using 2D systems such as that of HTML/JS canvas, but in 2D system, i can directly match a mouse coordinates to the actual graphic coordinates with additional computation for screen size/ratio/scale ofcourse.
      now i want to port it to OpenGL, i know that to render a 2D object in OpenGL we specify coordiantes in Clip space or use the orthographic projection, now heres what i need help about.
      1. what is the right way of rendering the GUI? is it thru drawing in clip space or switching to ortho projection?
      2. from screen coordinates (top left is 0,0 nd bottom right is width height), how can i map the mouse coordinates to OpenGL 2D so that mouse events such as button click works? In consideration ofcourse to the current screen/size dimension.
      3. when let say if the screen size/dimension is different, how to handle this? in my previous javascript 2D engine using canvas, i just have my working coordinates and then just perform the bitblk or copying my working canvas to screen canvas and scale the mouse coordinates from there, in OpenGL how to work on a multiple screen sizes (more like an OpenGL ES question).
      lastly, if you guys know any books, resources, links or tutorials that handle or discuss this, i found one with marekknows opengl game engine website but its not free,
      Just let me know. Did not have any luck finding resource in google for writing our own OpenGL GUI framework.
      IF there are no any available online, just let me know, what things do i need to look into for OpenGL and i will study them one by one to make it work.
      thank you, and looking forward to positive replies.
    • By fllwr0491
      I have a few beginner questions about tesselation that I really have no clue.
      The opengl wiki doesn't seem to talk anything about the details.
       
      What is the relationship between TCS layout out and TES layout in?
      How does the tesselator know how control points are organized?
          e.g. If TES input requests triangles, but TCS can output N vertices.
             What happens in this case?
      In this article,
      http://www.informit.com/articles/article.aspx?p=2120983
      the isoline example TCS out=4, but TES in=isoline.
      And gl_TessCoord is only a single one.
      So which ones are the control points?
      How are tesselator building primitives?
    • By Orella
      I've been developing a 2D Engine using SFML + ImGui.
      Here you can see an image
      The editor is rendered using ImGui and the scene window is a sf::RenderTexture where I draw the GameObjects and then is converted to ImGui::Image to render it in the editor.
      Now I need to create a 3D Engine during this year in my Bachelor Degree but using SDL2 + ImGui and I want to recreate what I did with the 2D Engine. 
      I've managed to render the editor like I did in the 2D Engine using this example that comes with ImGui. 
      3D Editor preview
      But I don't know how to create an equivalent of sf::RenderTexture in SDL2, so I can draw the 3D scene there and convert it to ImGui::Image to show it in the editor.
      If you can provide code will be better. And if you want me to provide any specific code tell me.
      Thanks!
  • Popular Now