Jump to content
  • Advertisement
Sign in to follow this  
Darragh

OpenGL Faster VBO rendering? Your thoughts..

This topic is 4808 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello, Well here I am again back on the subject of model rendering with VBOs. I'm currently trying to optimise my VBO rendering further because it is still not anyway satisfactory in terms of speed. To stress test my model rendering i made a simple map and added in loads of models. The objects I added are simple little ammo boxes which only have 12 tris each- and the original models i used were in .MD2 format. So I added around 600 or so of these little ammo crates and ran a quick profile on their rendering speed... The result was around 25,000,000 nanoseconds per draw- a speed of around 40FPS- not good considering the actual amount of tris being thrown around (only 7,200!). This result is on a P4 2.6GHz and X800XT graphics card. Examining the bottleneck further I can say that I am not fill rate rate limited. Neither screen resolution or moving further away from the objects increased rendering speed. Nor am I geometry limited, renedering the same amount of more detailed meshes makes little difference to the frame rate. Not CPU limited either- I try to avoid as much CPU work as possible (i'm using java). My conclusion then is that there must be some bottleneck in how I am setting up and using VBOs. I have already done the following things to try and increase rendering speed: - Indexing geometry as much as possible and using glDrawElements to render - Buffering triangle indexes onto AGP ram. - Adding extra data padding to my vertices/texcoords to bring the number of components per vertex to 4. All these optimisations have helped a little, but not enough.. I have also tried using display lists for the boxes rather than VBOs- since I am aware that small VBOs are not good for performance. This infact was a complete waste of time and yielded much worse performance than the previous VBO method. Concerning the VBO bottleneck, my only guess that the problem is in how I am binding my buffers... See, I bind the appropriate vertex/texcoord buffer for each model every time it is rendered, regardless if the same buffer is already bound anyway. It is a simple method of doing things and ensures that each model always renders the right data, but I am wondering how inefficient is this? Is the cost of constantly re-binding VBOs very high- so much so that it results in the above rendering speed? What I am thinking about doing now is implementing some kind of rendering qeue system for models. This system should firstly ensure that vertex buffers are not needlessly bound and re-bound, if the current model being rendered is the same as the last. Secondly, I don't know if this would be beneficial or not, but I am thinking doing a binary sort on the models to be rendered- based on their buffer number. The models could then be rendered sequentially by buffer number. My reason for this is that my VBO data would most likely be stored in contiginous blocks in AGP ram (I load all my models into AGP ram at the start of my game and keep them there until the end) and the GPU would then be able to sweep across the memory in a linear fashion- rendering all of the data. Do you think this would be a waste of time? Would it have any benefits? So if you have any advice on my problem and my proposed solutions then I would be glad to hear it. If you have links to any articles on optimising VBO rendering or general OpenGL optimisations then they would be welcome also- since I am always on the lookout for better ways of doing things. Thanks for your help, Darragh.

Share this post


Link to post
Share on other sites
Advertisement
Guest Anonymous Poster
I would advise you to go to the nvidia developer site and download all the papers and
articles on vertex buffer optimization. ATi's developer site has them as well but yeah
nVidias is fairly comprehensive and if you're keen you can optimize for the target platform
to get the best possible performance.

Share this post


Link to post
Share on other sites
Quote:
Original post by Darragh
See, I bind the appropriate vertex/texcoord buffer for each model every time it is rendered, regardless if the same buffer is already bound anyway. It is a simple method of doing things and ensures that each model always renders the right data, but I am wondering how inefficient is this? Is the cost of constantly re-binding VBOs very high- so much so that it results in the above rendering speed?


This could well be the source of your problem, try going for a single bind per object instance and see if that speeds things up.

The buffering system and sorting by model seems like a good idea, just out of intrest do all the models in your test use the same texture?

Try the batching system first, thats the most likely place to have an impact.

Share this post


Link to post
Share on other sites
Quote:
Original post by _the_phantom_
Quote:
Original post by Darragh
See, I bind the appropriate vertex/texcoord buffer for each model every time it is rendered, regardless if the same buffer is already bound anyway. It is a simple method of doing things and ensures that each model always renders the right data, but I am wondering how inefficient is this? Is the cost of constantly re-binding VBOs very high- so much so that it results in the above rendering speed?


This could well be the source of your problem, try going for a single bind per object instance and see if that speeds things up.

The buffering system and sorting by model seems like a good idea, just out of intrest do all the models in your test use the same texture?

Try the batching system first, thats the most likely place to have an impact.


Ah, my suspicions have been confirmed then.. Yeah I will definetly give the batching method a shot seeing as binds appear to be so expensive. And yes my little ammo boxes are all using the same textures too.

The NVIDIA articles that the AP mentioned also have some other interesting points that are worth considering.

Thanks guys, I think I should be able to improve the performance greatly now, taking these things into consideration.

Share this post


Link to post
Share on other sites
I would say that batching is your problem. A modern CPU can only dispatch a few tens of thousands of batches a second, regardless of if your batch is a single vertex, a thousand triangles, or 12 triangles (in your case). This number also holds if your using glDrawElements, standard vertex arrays, display lists, VBOs, etc. There is simply a minimum amount of overhead required to package up a command and send it to the GPU, regardless of how small the command is.

Ideally, you would cull based on the users view, and only have to render a small number of ammo boxes each frame, not all 600. However, 600*12 isn't a lot, so we can brute force it by just putting a hundred of ammo boxes into the same VBO (it won't take too much vram), and performance will be excellent.

One nice thing about DirectX, is it has some kind of 'instancing' system, designed to do this kind of thing for you (define one model, and a whole bunch of modelview matrixes, and the model will be rendered once with each MV matrix, with only a single batch draw call).

Share this post


Link to post
Share on other sites
The overhead of making a draw call in OpenGL is much, much, much, much, much, much, much, much, much, much, much less than it is in DirectX. I've done extensive testing on this, and have found that batch size in OpenGL isn't really something you need to be concerned about. Unlike DirectX, when you make a draw call in OpenGL, the driver immediately dumps some native hardware commands into a buffer and returns. When the buffer gets full, the commands are executed, and the hardware can eat through drawing commands extremely quickly.

It is for this reason that the instancing hardware in NV4x was never exposed through OpenGL -- there's really no need for it.

Share this post


Link to post
Share on other sites
Is that you Eric who came up with the math behind defining of bounding scissor rectangle of a given point-light source for optimal shadow volumes rendering?

Share this post


Link to post
Share on other sites
Quote:
Original post by RichardS
Ideally, you would cull based on the users view, and only have to render a small number of ammo boxes each frame, not all 600. However, 600*12 isn't a lot, so we can brute force it by just putting a hundred of ammo boxes into the same VBO (it won't take too much vram), and performance will be excellent.


Nope, this doesn't apply to the situation here. To stress test the model rendering I created a scene which was one room, with lots of models- all of which were visible. This was simply to push my engine to its limits and see how far it would go. I already have a simple portalling system working so something like this should never occur in real game terms, heck i'm never gonna have that many ammo crates even on a whole level anyways..

Good news however... The rendering-queue system is now fully operational and functioning- and i've seen massive increases in rendering speed as a result, somewhere in the region of a 10 fold!

I reckon there is still room for even more speed however. Going by an NVIDIA article on GPU cache optimisation which the AP posted, I should be ordering my indices in such a way so that many transformations can be avoided, just by using the cahce.

Just two other small questions then (I'll leave you alone after this- I promise! [smile] )

1- What is the vertex cache size for most modern GPUs? The NVIDIA article mentioned that the size was 16 vertices, though this was in '99 so i would imagine that things have improved since then. Is there some way to query through OpenGL for this information? This is important because it will ultimately affect how my optimisation algorithims work.

2- The NVIDA article also mentioned a Direct3D function called optimize() which optimises VBO contents for greater speed. Just wondering is there a similar function in OpenGL ?

Thanks again,
Darragh







Share this post


Link to post
Share on other sites
1 - probably in the mid 20s by now, but its not something which is exposed directly to use, just try to keep vertices together and things should work out fine

2 - no such functionality exists, you'd have todo the work yourself

Share this post


Link to post
Share on other sites
Thats great, thanks.

I'll go ahead and try the cache optimisation thing then. This should help increase the rendering speed further.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!