# OpenGL fastest way to render lots of small changing objects

This topic is 3470 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I am developing a 2d-like graphics system for games, using lots of different algoritms for creating nice dynamic graphics. Up till now, I have always used glBegin and so forth for the rendering, and learned fairly recently that this was not the way to go for optimal performance. So, I learned about VBO, and just now got it working - but it was wayyy slower, probably because am I not doing it right. My current approach is this: * creation Create VBO for all objects (if may be thousands) I use GL_STREAM_DRAW_ARB setting because I will update the data each frame. No matrix transformations is used. I put data for vertice, color and texture coords next to each other in the vbo for each object. * update I use glBindBufferARB and glMapBufferARB to get the gpu pointer, and then iterate the datavalues for vectors, colors and texturecoordinates and insert them into the pointer, then glUnmapBufferARB. * draw glEnableClientState for GL_VERTEX_ARRAY, GL_COLOR_ARRAY and GL_TEXTURE_COORD_ARRAY, specify pointers for them with the correct offset and then glDrawArrays. DisableClientState and BindBuffer to id 0. --- now, this was very slow compared to my usual approach using glBegin/glEnd. What am I doing wrong? Since I have seperate opengl drawing/blending settings for each gfx-object, I can't just put out all VBO-s at the same time using one large VBO.( can I? ) Does the gpu choke because of my large number of VBOs? I have another idea (based on how I THINK things work). I create _one_ VBO with the largest amount of data I think I will use for each gfx-object (usually about 20 vertices). Then, I use the same VBO all the time when inserting data and rendering, instead of thousands of small ones because I will not render them all at once anyway. Is the VBO-aproach suitable for loads of small objects with different opengl-settings? thanks for any help and suggestions!

##### Share on other sites
Quote:
 Original post by RockardSince I have seperate opengl drawing/blending settings for each gfx-object...
Herein lies your problem. The main benefit of VBO (or vertex arrays) is to allow you to render as much geometry as possible with a single draw call. If your objects really do need different drawing/blending settings (and they probably shouldn't), you should group them by matching settings, and collect each group in a VBO, and then render each group with a single call.

Unless you are doing something really strange though, you shouldn't need to set the blend equations more than a couple of times per frame - in a typical game, you disable blending, render all opaque objects, and then re-enable blending to render all the transparent objects.

Also, how many objects are we talking about, and what target hardware? I render 200-300 objects per frame, each with their own VBO, containing around 100 vertices, and my rendering is so far overshadowed by AI and physics, that it doesn't provide noticeable overhead.

##### Share on other sites
First, glBegin is quite fast compared to what people usually say.
But if you want to be cross-API it's not a good idea to use them anyway :)

VBO is the standard way of rendering. The solution for you I would be to create a big VBO. Like 400 vertices.

Then render your sprites by changing values in it incrementally.

BatchVBO[offset + 0].x = ...
... offset + 1
etc

Then when everything is ready for render, draw BatchVBO with the number of vertices you want to render. You dont have to render it all :)

This was, you update only once into the video card, it's very fast.

If you run out of vertices (400, or whatver max you put), just render it, and continue by starting at 0 again.

Using that technic you might want to sort by texture though. Or if order is important, render the batchVBO when you need to switch texture and start again at 0.

##### Share on other sites
This is something that's come up a few times before, and no one has mentioned this: when you have a large VBO shared by many objects (sprites), you can no longer use GL's rotate/translate -- you have to transform all the verts on the CPU.

Doesn't this somewhat undermine the speed gained by using VBOs? Especially since simpler 2D games are most likely not GPU-bound..

[Edited by - raigan on July 13, 2008 11:09:06 AM]

##### Share on other sites
Quote:
 Original post by raiganThis is something that's come up a few times before, and no one has mentioned this: when you have a large VBO shared by many objects (sprites), you can no longer use GL's rotate/translate -- you have to transform all the verts on the CPU.

Yes, you can use GL's rotate/translate and glLoadMatrixf, etc even when you have multiple objects in 1 VBO. That's what I do and have been doing for years.

##### Share on other sites
Quote:
 Original post by V-manYes, you can use GL's rotate/translate and glLoadMatrixf, etc even when you have multiple objects in 1 VBO. That's what I do and have been doing for years.

Could you explain how?

I can understand how this would work if you only had a few complex objects sharing a buffer -- as long as you have few objects, you have few draw calls and this makes sense.

But in the context of drawing a lot of 20-vertex objects (as the OP is doing) or Daivuk's suggestion, I'm confused about how this would work -- wouldn't you have to issue a separate DrawElements call for each unique transform? And doesn't this undermine the whole reason for using VBOs?

Each sprite in a 2D engine will have a unique transform, so that's one draw call per quad -- just as bad as immediate mode.

If you're using VBOs to get proper large-batch 2D drawing as swiftcoder described ("The main benefit of VBO (or vertex arrays) is to allow you to render as much geometry as possible with a single draw call") then I don't see how you can avoid transforming the geometry on the CPU. But I'm hoping that I'm missing something obvious..

##### Share on other sites
1. I've found that if you've got completely dynamic objects (like sprites) then there's no performance difference between VBOs and regular vertex arrays. (Unless you're re-rendering the same sprites multiple times a frame, like for some kind of post process).

2. Yes, that means you have to do the transformations yourself. This ends up being trivial in the large scale of things though.

3. I find the following works very well:
- Find all visible sprites/objects
- Sort by depth/texture/gl state as appropriate.
- Go through the sorted list, adding to a single big vertex array. Keep adding sprites that have the same GL state. When the state changes, flush the array (draw it) then continue adding sprites to it. Repeat until out of sprites.

This means you build up batches on the fly for each frame. You'll usually only use a few different blending modes, so that helps keep batches large. Also use sprite sheets/texture atlases so you can draw lots of different sprites without having to flush the list to change texture.

##### Share on other sites
Quote:
 Original post by raiganCould you explain how? I can understand how this would work if you only had a few complex objects sharing a buffer -- as long as you have few objects, you have few draw calls and this makes sense.But in the context of drawing a lot of 20-vertex objects (as the OP is doing) or Daivuk's suggestion, I'm confused about how this would work -- wouldn't you have to issue a separate DrawElements call for each unique transform? And doesn't this undermine the whole reason for using VBOs? Each sprite in a 2D engine will have a unique transform, so that's one draw call per quad -- just as bad as immediate mode.If you're using VBOs to get proper large-batch 2D drawing as swiftcoder described ("The main benefit of VBO (or vertex arrays) is to allow you to render as much geometry as possible with a single draw call") then I don't see how you can avoid transforming the geometry on the CPU. But I'm hoping that I'm missing something obvious..

Yes, you would issue a DrawElements call per object.

Quote:
 And doesn't this undermine the whole reason for using VBOs?

The reason to use VBO is to keep data on the GPU. If you have dynamic data, the reason is we assume is that's what the driver prefers.

Why store multiple objects in 1 VBO?
To reduce GL state changes.
You would have to call glBindBuffer less often. You would call gl***Pointer less often as well.

In terms of performance gain for a simple sprite rendering engine, I have no idea if it will improve performance since I'm not working on one.

Immediate mode, vertex arrays, compiled vertex arrays, display lists are also other ways. As you can see, GL has many ways to send data.
What is specific about VBOs is that it is for storing vertex/indices only.
The driver decides if it should be placed in VRAM or elsewhere..

##### Share on other sites
Another reason to batch small objects into VBO's is for caching. You have to hit that sweet spot where you have enough geometry in the VBO so it doesn't have to keep going back for stuff, and too much geometry in the VBO where it can't fit the whole thing into the cache. Right now from what I've read on these boards, VBO's should be in the neighborhood of 1MB to 4MB, but as technology changes, so too will these numbers.

##### Share on other sites

Wow!
Incredibly great responses and lots of valuable discussions!

I will try out the approach of first making a big VBO, then keep inserting
vertices until some of the gl-options differ - by then, I draw and start over again. This approach will work great with my current problem with loads of bullets of the same type. Right now, I can put out about 3000 with 4 vertices each with vsync at 60 fps - more bullets and the fps will drop. I think it is the enormous amount of openglcalls that kills the performance.

This approach of collecting vertices dynamicly will probably also work wonders for a tilesystem I will insert later. I was first thinking about making this stupid solution about sorting stuff manually in groups using bit-settings... man that would have been a waste of time!

Hohooo I'm so excited! I will begin coding my new system right away.
I'm really hoping I will break that 3000 bullets barrier.
I'll will report my results.

##### Share on other sites
Vertex Arrays and VBO's are very similar in how you use them, so if you're going to try one method you can try the other with just a few lines of code changed. For fully dynamic data, the performance will probably be identical (VBO could be slower and use more memory if the driver is stupid).

While developing Soup du Jour I ended up using Vertex Arrays. The renderer builds a triangle list for each render state (texture + blending mode) in use each frame... Basically when "drawing" an object its triangles are added to a list; and at the end of the frame each list is rendered with one glDrawElements call. The performance gain from immediate mode was roughly 3X (went from 20K triangles to 60K triangles max per frame on the target machine). Dynamic VBO had no advantage over VA.

##### Share on other sites
I'd put all texture-data in a single texture (sprite-sheet)
Then, keep a big vertex-array (ie 1MB), allocated with malloc() or HeapAlloc, or similar.
Each vertex will have its data in a structure like this:
typedef struct{  float x,y; // non-trasformed position  float texX,texY; // texture coordinate  //--[ duplicated data for all vertices in an instance of a 2D object ]------[  int  colorParameter1;  int  colorParameter2;   float Z_rotation; // rotation of sprite  float transX,transY; // position of sprite  float effectType;  //--------------------------------------------------------------------------/}VtxDecl;

Well, the above will be better packed like this:
typedef struct{
vec4 pos_uv;
int colorParameter1;
int colorParameter2;
vec4 transform_and_effectType;
}
This would be done to more effectively send data to your vertex shader.

In your vertex-shader, use sin() and cos() to rotate, then translate the vertex, and finally transform it into clip-space. Send the color-parameters, UVs and effect-type to your fragment shader. There, fetch a texel, and do whatever color effects you want.

Simply you don't need matrices for 2D stuff, and avoid their overhead especially when each batch is tiny. Sin/Cos on cpus is very precise and slow, so we can offload that into the vertex-shader.
Changing textures is expensive, so we optimize that with a single sprite-sheet.
Changing effects usually doesn't require drastic changes of what we do (we'll always be fetching a texel, then play with its colors).
The only problem can be alpha/additive-blending, but it can be made non-problem by just improving your engine with a couple of easy tricks.

##### Share on other sites
Quote:
 Original post by Rockard* updateI use glBindBufferARB and glMapBufferARB to get the gpu pointer, and then iterate the datavalues for vectors, colors and texturecoordinates and insert them into the pointer, then glUnmapBufferARB.

Do you really need the old values for the update? Because otherwise you could also try to use glBufferData (or glBufferSubData) to update the geometry. That way the data doesn't have to mapped into the address space of the CPU. Especially if you only update a small fraction glBufferSubData might be faster (less data to transfer).

##### Share on other sites

This topic is 3470 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Create an account

Register a new account

• ### Similar Content

• By xhcao
Does sync be needed to read texture content after access texture image in compute shader?
My simple code is as below,
glUseProgram(program.get());
glBindImageTexture(0, texture[0], 0, GL_FALSE, 3, GL_READ_ONLY, GL_R32UI);
glBindImageTexture(1, texture[1], 0, GL_FALSE, 4, GL_WRITE_ONLY, GL_R32UI);
glDispatchCompute(1, 1, 1);
// Does sync be needed here?
glUseProgram(0);
GL_TEXTURE_CUBE_MAP_POSITIVE_X + face, texture[1], 0);
glReadPixels(0, 0, kWidth, kHeight, GL_RED_INTEGER, GL_UNSIGNED_INT, outputValues);

Compute shader is very simple, imageLoad content from texture[0], and imageStore content to texture[1]. Does need to sync after dispatchCompute?

• My question: is it possible to transform multiple angular velocities so that they can be reinserted as one? My research is below:

• I have this code below in both my vertex and fragment shader, however when I request glGetUniformLocation("Lights[0].diffuse") or "Lights[0].attenuation", it returns -1. It will only give me a valid uniform location if I actually use the diffuse/attenuation variables in the VERTEX shader. Because I use position in the vertex shader, it always returns a valid uniform location. I've read that I can share uniforms across both vertex and fragment, but I'm confused what this is even compiling to if this is the case.

#define NUM_LIGHTS 2
struct Light
{
vec3 position;
vec3 diffuse;
float attenuation;
};
uniform Light Lights[NUM_LIGHTS];

• By pr033r
Hello,
I have a Bachelor project on topic "Implenet 3D Boid's algorithm in OpenGL". All OpenGL issues works fine for me, all rendering etc. But when I started implement the boid's algorithm it was getting worse and worse. I read article (http://natureofcode.com/book/chapter-6-autonomous-agents/) inspirate from another code (here: https://github.com/jyanar/Boids/tree/master/src) but it still doesn't work like in tutorials and videos. For example the main problem: when I apply Cohesion (one of three main laws of boids) it makes some "cycling knot". Second, when some flock touch to another it scary change the coordination or respawn in origin (x: 0, y:0. z:0). Just some streng things.
I followed many tutorials, change a try everything but it isn't so smooth, without lags like in another videos. I really need your help.
My code (optimalizing branch): https://github.com/pr033r/BachelorProject/tree/Optimalizing
Exe file (if you want to look) and models folder (for those who will download the sources):
http://leteckaposta.cz/367190436
Thanks for any help...

• By Andrija
I am currently trying to implement shadow mapping into my project , but although i can render my depth map to the screen and it looks okay , when i sample it with shadowCoords there is no shadow.
Here is my light space matrix calculation
mat4x4 lightViewMatrix; vec3 sun_pos = {SUN_OFFSET * the_sun->direction[0], SUN_OFFSET * the_sun->direction[1], SUN_OFFSET * the_sun->direction[2]}; mat4x4_look_at(lightViewMatrix,sun_pos,player->pos,up); mat4x4_mul(lightSpaceMatrix,lightProjMatrix,lightViewMatrix); I will tweak the values for the size and frustum of the shadow map, but for now i just want to draw shadows around the player position
the_sun->direction is a normalized vector so i multiply it by a constant to get the position.
player->pos is the camera position in world space
the light projection matrix is calculated like this:
uniform mat4 light_space_matrix; void main() { gl_Position = light_space_matrix * transfMatrix * vec4(position, 1.0f); } Shadow fragment shader:
out float fragDepth; void main() { fragDepth = gl_FragCoord.z; } I am using deferred rendering so i have all my world positions in the g_positions buffer