Jump to content
  • Advertisement
TMII

OpenGL Text rendering really slow

Recommended Posts

I was trying to implement text rendering and thought I was really smart using instancing in order to improve performance. Jesus, I couldn't have been more wrong than that. The question is: What is the best approach now? Do I really have to go the "OpenGL 1.1" route and upload every character model repeatedly to a VBO?

The initial idea was that every character is basically a "quad" rendered at a specific "position", with character specific "scale" and "textureCoordinates". The idea was to save the character specific data in an uniform vec4 array on the shader side, accessed with a single "characterIndex" that is sent per instance in order to save performance by not sending tons of duplicate data. It works, but it is really slow and I think the problem is accessing the array:

#version 310

uniform mat4 projectionMatrix;
uniform mat4 worldMatrix;
uniform mat4 modelMatrix;

in vec4 in_Position;
in vec2 in_TextureCoord;
in float characterIndex;
in float advance;

uniform vec2[256] texturePosData;
uniform vec2[256] scaleData;

out vec2 pass_TextureCoord;

void main(void) {
    int index = int(characterIndex);
    vec2 texturePos = texturePosData[index];
    vec2 scale = scaleData[index];

   gl_Position = projectionMatrix * worldMatrix * modelMatrix * vec4(in_Position.x * scale.x + advance, in_Position.y * scale.y, in_Position.zw);

   pass_TextureCoord = in_TextureCoord * scale + texturePos;
}

By just rendering a few strings, the FPS went from 60 to 5. The question is: What do I do now? Is it a better approach to pack and send the arrays (2*vec2) in a VBO to the shader?

I tried to avoid just that, because a string with "AAAAAAAA" i.e. sends a bunch of duplicate data, where before, only the index referenced a specific character. So there is not only much more to upload for every rendered string, there is also 4 times more data to be prepared on the CPU side.

 

Thank you very much

Share this post


Link to post
Share on other sites
Advertisement

Yes, thank you, even though I believe both methods are exactly the same, with the second being to render the fonts dynamically on program startup. What I do not understand about his approach (I am not a C++ guy) is he uploading a new quad for every character to be rendered?

Share this post


Link to post
Share on other sites

Phew .. i am sorry, though i am a (small) C++ guy, i actually use imgui for all the ui stuff and text rendering, which relies on truetype. I haven't played with these tutorials (yet). Imgui, if it can be seen as a benchmark for that type of rendering, is very fast, i actually don't realise much of a difference in terms of framerate.

If i understand it right from a short overfly, yes, in a loop they update and draw a dynamic vertex buffer per character. One can, of course prepare a larger buffer and draw that.

 

Share this post


Link to post
Share on other sites

youse textures or one texture

define a chartable that is drawable if theres a character you cant render (cause its not in the table render let say first character )

you either create separated files for each character, or use a bigger texture that holds all characters you willl render.

 

 

then you create a buffer which holds a text lets say it has 200 characters per width, to simplify this, just draw 200 chars at one width dont mention another lines etc for now (you do this by going to the 'next line')

the best way is to use one texture , because switching through say 256 textures to render some simple pharse, kills the performance.

Anyway, you need to create a buffer which holds vertices, on the screen, 4 for each character,, the best you can  do is to create a big buffef (max chars per width), then you just use glBufferSubData for that line to change texcoords  and vert poses.

It aint that slow as you think, but yes you will have to do it anually, just remember the scr cord are from -1..1

 

Share this post


Link to post
Share on other sites

To OP: Your approach is fine, but using instancing for small meshes is not recommended, unfortunately. There are other approaches Vertex Shader Tricks, page 15 ff (implemented in HLSL/DX11, but I guess the principle should work for OpenGL, too).

More importantly: You need/want to pinpoint the bottleneck first. Never hesitate to use GPU debugging tools ;)

Share this post


Link to post
Share on other sites

In my experience you can get away with creating a grayscale texture (8 bit per pixel) with all the text for each independent UI element and then just rendering the texture with a shader that you could use to colorize the text.

You could render the text with the freetype library if you wanted to.

Share this post


Link to post
Share on other sites
Posted (edited)

Thanks for the answers, I was taking the hint to use a profiler and I found it stalling nearly a second mostly here (~600ms)

glUniform3f(location, value0, value1, value2);

and here (~300ms)

glUniformMatrix4fv(location, 1, false, dataArray, 0);

this is enormous and I have absolutly no idea why this is. It gets even weirder because glUniformMatrix gets called another time, right before the stalling one and it has no visible performance impact in the profiler (literally).

I noticed that updating the VBOs with glBufferData has a relative tiny overhead, so I changed the uniforms in the shader to attributes and the problems are gone.

Does someone know why this is? Do uniforms cause some synchronization issue?

 

 

*Edit

And if it wouldn't be already weird enough, as the program keeps running this stalling gets noticable worser.

Edited by TMII

Share this post


Link to post
Share on other sites
Posted (edited)

You are using opengl in a ... creative ... way it 😉 by copying in large amounts of position and scale data as uniforms. Though that should not take 0.3 sec (if the data isn't copied in multiple times per frame, you haven't posted the code), it does not accelerate the process. Also, world matrix (which to my knowledge is just another word for model matrix) seems to be used for the view matrix. And, this is static 2D, a model matrix isn't needed, only position data and orthogonal projection (viewport size would suffice).

I strongly recommend to play through the tutorials, and if interested read through the shader uniform and uniform buffer chapters. If there are large amounts of data that must be passed in as uniforms, one would create a uniform buffer and fill that with data, but that requires some more gymnastics (create a dynamic buffer, eventually map it, fill it with copying memory, and give it back to the driver). Keep in mind, the vertex shader is executed once per frame and per vertex, communication with the host program should stay low. In the case of character rendering, there is no need for uniforms at all, just position data from a vertex array object. Maybe text colour.

Hope that wasn't too impertinent, it surely wasn't meant to be. A look at the vertex shader in the example linked above should clear up some thing. On my GTX 970 text rendering (with dear umgui) in a debug environment and compiled with full debug information the framerate is around 4.000-8.000/s.

 

Edit: you wrote "do i have to craete a vao for every character". Nope, not necessarily, as the examples show. One could create a vao in any size. But reading data from a vao is much faster than from a uniform object and happens in parallel. One could maybe see a uniform more like an annoying interruption for the pipeline 🙂

Edited by Green_Baron

Share this post


Link to post
Share on other sites

Thank you. I am going to avoid uniforms then. It's the same with glBufferSubData it seems.

16 hours ago, unbird said:

To OP: Your approach is fine, but using instancing for small meshes is not recommended, unfortunately. There are other approaches Vertex Shader Tricks, page 15 ff (implemented in HLSL/DX11, but I guess the principle should work for OpenGL, too).

More importantly: You need/want to pinpoint the bottleneck first. Never hesitate to use GPU debugging tools ;)

I read that a lot of times and I was wondering about that because it reads like it has "disadvantages" using instancing for small meshes but as far as I can tell, drawing one quad vs drawing a thousand quads instanced does not make much of a difference. The point sprite version might be a tad faster (probably important for a AAA title but a waste of time for me at the moment) but has the disadvantage that they disappear at the edge of the screen quite visibly.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!