fastest way to render lots of small changing objects
I am developing a 2d-like graphics system for games, using lots of different algoritms for creating nice dynamic graphics. Up till now, I have always used glBegin and so forth for the rendering, and learned fairly recently that this was not the way to go for optimal performance.
So, I learned about VBO, and just now got it working -
but it was wayyy slower, probably because am I not doing it right.
My current approach is this:
* creation
Create VBO for all objects (if may be thousands)
I use GL_STREAM_DRAW_ARB setting because I will update the data each frame.
No matrix transformations is used.
I put data for vertice, color and texture coords next to each other in the vbo
for each object.
* update
I use glBindBufferARB and glMapBufferARB to get the gpu pointer, and then iterate the datavalues for vectors, colors and texturecoordinates and insert them into the pointer, then glUnmapBufferARB.
* draw
glEnableClientState for GL_VERTEX_ARRAY, GL_COLOR_ARRAY and GL_TEXTURE_COORD_ARRAY, specify pointers for them with the correct offset
and then glDrawArrays. DisableClientState and BindBuffer to id 0.
---
now, this was very slow compared to my usual approach using glBegin/glEnd.
What am I doing wrong?
Since I have seperate opengl drawing/blending settings for each gfx-object, I can't just put out all VBO-s at the same time using one large VBO.( can I? )
Does the gpu choke because of my large number of VBOs?
I have another idea (based on how I THINK things work).
I create _one_ VBO with the largest amount of data I think I will use for each gfx-object (usually about 20 vertices). Then, I use the same VBO all the time when inserting data and rendering, instead of thousands of small ones because I will not render them all at once anyway.
Is the VBO-aproach suitable for loads of small objects with different opengl-settings?
thanks for any help and suggestions!
Quote:Original post by RockardHerein lies your problem. The main benefit of VBO (or vertex arrays) is to allow you to render as much geometry as possible with a single draw call. If your objects really do need different drawing/blending settings (and they probably shouldn't), you should group them by matching settings, and collect each group in a VBO, and then render each group with a single call.
Since I have seperate opengl drawing/blending settings for each gfx-object...
Unless you are doing something really strange though, you shouldn't need to set the blend equations more than a couple of times per frame - in a typical game, you disable blending, render all opaque objects, and then re-enable blending to render all the transparent objects.
Also, how many objects are we talking about, and what target hardware? I render 200-300 objects per frame, each with their own VBO, containing around 100 vertices, and my rendering is so far overshadowed by AI and physics, that it doesn't provide noticeable overhead.
First, glBegin is quite fast compared to what people usually say.
But if you want to be cross-API it's not a good idea to use them anyway :)
VBO is the standard way of rendering. The solution for you I would be to create a big VBO. Like 400 vertices.
Then render your sprites by changing values in it incrementally.
BatchVBO[offset + 0].x = ...
... offset + 1
etc
Then when everything is ready for render, draw BatchVBO with the number of vertices you want to render. You dont have to render it all :)
This was, you update only once into the video card, it's very fast.
If you run out of vertices (400, or whatver max you put), just render it, and continue by starting at 0 again.
Using that technic you might want to sort by texture though. Or if order is important, render the batchVBO when you need to switch texture and start again at 0.
But if you want to be cross-API it's not a good idea to use them anyway :)
VBO is the standard way of rendering. The solution for you I would be to create a big VBO. Like 400 vertices.
Then render your sprites by changing values in it incrementally.
BatchVBO[offset + 0].x = ...
... offset + 1
etc
Then when everything is ready for render, draw BatchVBO with the number of vertices you want to render. You dont have to render it all :)
This was, you update only once into the video card, it's very fast.
If you run out of vertices (400, or whatver max you put), just render it, and continue by starting at 0 again.
Using that technic you might want to sort by texture though. Or if order is important, render the batchVBO when you need to switch texture and start again at 0.
This is something that's come up a few times before, and no one has mentioned this: when you have a large VBO shared by many objects (sprites), you can no longer use GL's rotate/translate -- you have to transform all the verts on the CPU.
Doesn't this somewhat undermine the speed gained by using VBOs? Especially since simpler 2D games are most likely not GPU-bound..
[Edited by - raigan on July 13, 2008 11:09:06 AM]
Doesn't this somewhat undermine the speed gained by using VBOs? Especially since simpler 2D games are most likely not GPU-bound..
[Edited by - raigan on July 13, 2008 11:09:06 AM]
Quote:Original post by raigan
This is something that's come up a few times before, and no one has mentioned this: when you have a large VBO shared by many objects (sprites), you can no longer use GL's rotate/translate -- you have to transform all the verts on the CPU.
Yes, you can use GL's rotate/translate and glLoadMatrixf, etc even when you have multiple objects in 1 VBO. That's what I do and have been doing for years.
Quote:Original post by V-man
Yes, you can use GL's rotate/translate and glLoadMatrixf, etc even when you have multiple objects in 1 VBO. That's what I do and have been doing for years.
Could you explain how?
I can understand how this would work if you only had a few complex objects sharing a buffer -- as long as you have few objects, you have few draw calls and this makes sense.
But in the context of drawing a lot of 20-vertex objects (as the OP is doing) or Daivuk's suggestion, I'm confused about how this would work -- wouldn't you have to issue a separate DrawElements call for each unique transform? And doesn't this undermine the whole reason for using VBOs?
Each sprite in a 2D engine will have a unique transform, so that's one draw call per quad -- just as bad as immediate mode.
If you're using VBOs to get proper large-batch 2D drawing as swiftcoder described ("The main benefit of VBO (or vertex arrays) is to allow you to render as much geometry as possible with a single draw call") then I don't see how you can avoid transforming the geometry on the CPU. But I'm hoping that I'm missing something obvious..
1. I've found that if you've got completely dynamic objects (like sprites) then there's no performance difference between VBOs and regular vertex arrays. (Unless you're re-rendering the same sprites multiple times a frame, like for some kind of post process).
2. Yes, that means you have to do the transformations yourself. This ends up being trivial in the large scale of things though.
3. I find the following works very well:
- Find all visible sprites/objects
- Sort by depth/texture/gl state as appropriate.
- Go through the sorted list, adding to a single big vertex array. Keep adding sprites that have the same GL state. When the state changes, flush the array (draw it) then continue adding sprites to it. Repeat until out of sprites.
This means you build up batches on the fly for each frame. You'll usually only use a few different blending modes, so that helps keep batches large. Also use sprite sheets/texture atlases so you can draw lots of different sprites without having to flush the list to change texture.
2. Yes, that means you have to do the transformations yourself. This ends up being trivial in the large scale of things though.
3. I find the following works very well:
- Find all visible sprites/objects
- Sort by depth/texture/gl state as appropriate.
- Go through the sorted list, adding to a single big vertex array. Keep adding sprites that have the same GL state. When the state changes, flush the array (draw it) then continue adding sprites to it. Repeat until out of sprites.
This means you build up batches on the fly for each frame. You'll usually only use a few different blending modes, so that helps keep batches large. Also use sprite sheets/texture atlases so you can draw lots of different sprites without having to flush the list to change texture.
Quote:Original post by raigan
Could you explain how?
I can understand how this would work if you only had a few complex objects sharing a buffer -- as long as you have few objects, you have few draw calls and this makes sense.
But in the context of drawing a lot of 20-vertex objects (as the OP is doing) or Daivuk's suggestion, I'm confused about how this would work -- wouldn't you have to issue a separate DrawElements call for each unique transform? And doesn't this undermine the whole reason for using VBOs?
Each sprite in a 2D engine will have a unique transform, so that's one draw call per quad -- just as bad as immediate mode.
If you're using VBOs to get proper large-batch 2D drawing as swiftcoder described ("The main benefit of VBO (or vertex arrays) is to allow you to render as much geometry as possible with a single draw call") then I don't see how you can avoid transforming the geometry on the CPU. But I'm hoping that I'm missing something obvious..
Yes, you would issue a DrawElements call per object.
Quote:And doesn't this undermine the whole reason for using VBOs?
The reason to use VBO is to keep data on the GPU. If you have dynamic data, the reason is we assume is that's what the driver prefers.
Why store multiple objects in 1 VBO?
To reduce GL state changes.
You would have to call glBindBuffer less often. You would call gl***Pointer less often as well.
In terms of performance gain for a simple sprite rendering engine, I have no idea if it will improve performance since I'm not working on one.
Immediate mode, vertex arrays, compiled vertex arrays, display lists are also other ways. As you can see, GL has many ways to send data.
What is specific about VBOs is that it is for storing vertex/indices only.
The driver decides if it should be placed in VRAM or elsewhere..
Another reason to batch small objects into VBO's is for caching. You have to hit that sweet spot where you have enough geometry in the VBO so it doesn't have to keep going back for stuff, and too much geometry in the VBO where it can't fit the whole thing into the cache. Right now from what I've read on these boards, VBO's should be in the neighborhood of 1MB to 4MB, but as technology changes, so too will these numbers.
Wow!
Incredibly great responses and lots of valuable discussions!
I will try out the approach of first making a big VBO, then keep inserting
vertices until some of the gl-options differ - by then, I draw and start over again. This approach will work great with my current problem with loads of bullets of the same type. Right now, I can put out about 3000 with 4 vertices each with vsync at 60 fps - more bullets and the fps will drop. I think it is the enormous amount of openglcalls that kills the performance.
This approach of collecting vertices dynamicly will probably also work wonders for a tilesystem I will insert later. I was first thinking about making this stupid solution about sorting stuff manually in groups using bit-settings... man that would have been a waste of time!
Hohooo I'm so excited! I will begin coding my new system right away.
I'm really hoping I will break that 3000 bullets barrier.
I'll will report my results.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement