Most efficient way to batch drawings

Started by
12 comments, last by Xeeynamo 11 years, 3 months ago

A VBO should be performant enough for the vast majority of cases. If you need better performance you can pass points and expand them to triangles in a geometry shader.

Read up here on some techniques for VBO optimization with sprite batching:

http://www.java-gaming.org/topics/opengl-lightning-fast-managed-vbo-mapping/28209/view.html

Since ultimately the performance may vary depending on the driver, the absolute "fastest" solution is to use whatever works best for the driver. For example, in the intro cutscene of your game you might benchmark a few different rendering techniques, and pick whichever runs the fastest.

Advertisement
| measuring it with gDEBugger, setting SwapBuffer as end-of-frame. With the profiling of Visual Studio, I can see clearly that SwapBuffers takes the 50% of the CPU in a single frame.

Well, I don't know how gDEBugger measures execution time, but I have to draw your attention to the following facts:

1. All issued GL commands execute on both CPU and GPU.

2. CPU execution time is usually very short (of course it depends on a command), commands are set in a command queue and the control is returned to the CPU.

3. SwapBuffers, as its name implies, exchanges the front and back buffers. In order to do that, it flushes command queue and waits until the drawing is finished. It is probably dependent on the implementation, but on my laptop with Windows Vista, it is a blocking function. Take a look at the attached picture.

GLProfiler_SB.JPG

Blue lines represent CPU time, while red ones represent GPU time. Although you could say SwapBuffers consumes 78% of the frame time, it is simple not the truth. The answer is in the blue line in the window "Frame". GPU takes about 13ms to render the frame, although CPU is utilized only 0.67ms. That's what I talked about.

4. Frame-rate can be only 120 (rarely), 60, 30, 15, etc. if vsync is on. What you have posted is an effective frame-rate. So, it is better to use time of execution instead of FPS.

5. Having effective frame-rate greater than 120 induce performance state changing, since GPU is not utilized enough. It is very hard to profile application in such circumstances. That's why I proposed a performance state tracking alongside with profiling (take a look at OpenGL Insights, pg.527-534.).

If you're running slower with a VBO then you're doing something wrong - most likely case is that the code you're using to update the VBO is causing sync points which are killing your framerate. This is a common enough failing - you just can't treat a VBO as if it were just another block of memory that you can freely write to, read from, etc as if it were a regular pointer. You should post the code you're using to update your VBO and it will be possible to comment further, but for now, and to get you started, a read of this article is recommended: http://www.opengl.org/wiki/Buffer_Object_Streaming

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

I should avoid the structures all-in-one (I read from OpenGL documentation that it's implemented for D3D compatibility)

Not sure where in the OpenGL docs you read that, but you should be aware that the ability to use interleaved attribs has been part of OpenGL since the original GL_EXT_vertex_array in 1995. In the common case interleaving should in fact be the faster option; there are certain cases for sure where it may be slower (such as using software T&L, doing a separate shadow pass, etc) but if none of those cases apply and if it still runs slower for you then - again - you've got something else wrong.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Okay, in these days I rewrote the entire sprite system. I'm using a pre-calculated unsigned short array for vertex indices and I'm copying the four vertices for each sprite in an array that is used as a cache. Now the framework reaches 997 fps with 20000 triangles. I'm currently using glVertexPointer and glDrawElements due to OpenGL 2.1 compatibility. I'm binding only one texture per frame. I discovered that the rendering isn't really CPU-limited, in fact I overclocked my video-card (it was in under clock to save power) and the framework reaches 1800fps. For VBO I don't understand exactly how to initialize and use it properly, it isn't the same thing to cache all the vertices in main memory then send all together before to call SwapBuffers? I forgot also to mention that more or less the 90% of the vertices changes every frame, so caching them inside the video-card has't a great effect... Also I don't understand how to buffer the uniforms and how to use them. It's possible also to avoid the glBindTexture? I know that I can upload the textures in a single big texture, but I'm asking if there is another way to switch with a batch/buffer the texture binding.

This topic is closed to new replies.

Advertisement