GUI: immediate mode to buffer based

Graphics and GPU Programming Programming OpenGL

Started by Ashaman73 December 01, 2014 10:43 AM

3 comments, last by Ashaman73 9 years, 4 months ago

13,718

Author

December 01, 2014 10:43 AM

My engine is still based on OpenGL 2.1 and I would like to migrate it to a higher OGL version in the (far) future. To prepare the engine and optimize it by the way, I would like to get rid of the (deprecated) immediate mode GUI. Nevertheless, I would like to keep most of my GUI framework and features, which are
- all gui elements (including text letters) are quads,
- all gui elements could have a shader,
- all gui elements could have a texture (already on a handful of atlases),
- gui is redrawn each frame,
- gui is drawn in strict order (lot of alpha blending).

Thought, I'm not able to use OGL >2.1 yet, I would like to use an architecture which comes close to Ogl4, that is, I plan to make it completely buffered based (vertex buffers for now). The rest should try to mimic new features (shader buffers, bindless etc.) to make a later migration easier.

My basic idea is
1. map a vertex buffer object (double buffered)
2. while drawing the widgets, fill up the vertex buffer object
3. fill an additional buffer with meta information about shader,texture
4. unmap the buffer
5. bind the buffer
6. draw quad from the buffer in order,switch texture/shader if necessary, try to batch as much quads as possible by grouping them by texture/shader

Well, this could result in lot of texture/shader switches and draw calls. I'm now looking for hints/ideas to optimize it.
These ideas are coming to mind:

I. Assign a layer to each quad, quads on the same layer can be rendered out of order.
II. Use a texture array (same texture size required, utilize atlases) and use an index (3rd tex-coord) into the texture layer, avoiding texture switches.
III. Group multiple shaders into a less uber-shaders.
IV. Sort quads by layer and shader.

I would like to hear some comment and critics on my idea ? Is this the wrong way to approach the next generation of APIs (Mantel/OpenGL Next) ?

Are there other known, (good) working alternatives ?

Ashaman

Gnoblins: Website - Facebook - Twitter - Youtube - Steam Greenlit - IndieDB - Gamedev Log

wintertime

4,154

December 01, 2014 11:58 AM

That reads as if your GUI code is completely intermingled with the rendering code and you have to change a large amount of code.

I would use that opportunity to first create a rendering function or class method that takes a rectangle, shader id and texture info and does the draw calls in the old-fashioned way, then change all GUI code to use it.

When that code is in a single place it will get much easier to switch to using a different renderer that uses modern methods and does batching.

Ashaman73

13,718

Author

December 01, 2014 12:12 PM

eate a rendering function or class method that takes a rectangle, shader id and texture info and does the draw calls in the old-fashioned way, then change all GUI code to use it.

This is already the case (MVC pattern), I just render it the very old fashioned way (glBegin...glVertex...glColor...glEnd => next rect). Exchanging this with 1:1 buffered rendering would not be the problem either, but I want to target the usage of modern APIs and switching the textures/shaders/uniforms all the time doesn't sound modern.

Ashaman

Gnoblins: Website - Facebook - Twitter - Youtube - Steam Greenlit - IndieDB - Gamedev Log

Hodgman

52,717

December 01, 2014 01:05 PM

It's quite common from what I've seen, for engines to use a large VBO to emulate the old immediate drawing themselves.

The biggest pitfall is that you can introduce a CPU-GPU sync point by trying to map a buffer that the GPU is using (with the wrong flags/hints set), which instantly halves your frame-rate. The links below should help to avoid that situation:

GL-specific details:

http://www.seas.upenn.edu/~pcozzi/OpenGLInsights/OpenGLInsights-AsynchronousBufferTransfers.pdf

Theory - they call this situation a "Transient Buffer":

https://developer.nvidia.com/sites/default/files/akamai/gamedev/files/gdc12/Efficient_Buffer_Management_McDonald.pdf

[edit]To translate the ideas between the two -- when the Theory PDF one mentions "discard" and "no-overwrite", the GL PDF mentions "orphaning" and "unsynchronized".

When using map-discard/orphaning, the driver internally allocates a new buffer under the same VBO ID, and garbage-collects the previous allocation once the GPU has actually finished consuming it.

When using no-overwrite/unsynchronized, the driver puts you in charge, trusting that you'll be very careful not to overwrite any data that the GPU hasn't yet consumed. To do this you generally insert a marker per frame, called events/fences, letting you know how many frames behind the CPU the GPU currently is.

If you're not using one of those strategies, then you're asking for the driver to stall the CPU and make sure the GPU isn't touching your buffer at all before returning from the Map call.

Is this the wrong way to approach the next generation of APIs (Mantel/OpenGL Next) ?

Nextgen APIs reduce the CPU-side overhead of state-changes and draw-calls.

Current hardware has already reduced the GPU-side overheads, and older hardware only suffers from state-change induced stalls if you render less than a few hundred pixels per batch (where a batch is a series of draw-calls using the same state).

So - I/II/III/IV are more useful now than they are in the future

. 22 Racing Series .

Ashaman73

13,718

Author

December 01, 2014 02:07 PM

It's quite common from what I've seen, for engines to use a large VBO to emulate the old immediate drawing themselves.

This gives hope

The biggest pitfall is that you can introduce a CPU-GPU sync point by trying to map a buffer that the GPU is using (with the wrong flags/hints set), which instantly halves your frame-rate. The links below should help to avoid that situation:

...

If you're not using one of those strategies, then you're asking for the driver to stall the CPU and make sure the GPU isn't touching your buffer at all before returning from the Map call.

Thx for the link, I will take a look at both papers. I've already some experiences with a double buffered PBO to create a dynamic texture. I would approach it in a similar way.

Nextgen APIs reduce the CPU-side overhead of state-changes and draw-calls.

Current hardware has already reduced the GPU-side overheads, and older hardware only suffers from state-change induced stalls if you render less than a few hundred pixels per batch (where a batch is a series of draw-calls using the same state).

So - I/II/III/IV are more useful now than they are in the future smile.png

Hmm.... maybe I could save some optimizations here