Jump to content

  • Log In with Google      Sign In   
  • Create Account


What if I have more models per level than available VBO memory?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
18 replies to this topic

#1 Labrasones   Members   -  Reputation: 337

Like
0Likes
Like

Posted 05 July 2013 - 07:39 PM

I'm trying to understand more about how Vertex Buffer Objects work and how to best use them. I've run into a hypothetical situation that I can't find an answer to. Most tutorials are simple and don't cover complexity such as this. A lot of these questions probably stem from misunderstandings about VBOs.

 

If I have more models required in the current scene/level than I can create VBO data for, but not all are visible all the same time, how do I resolve the issue?

 

From my understanding of VBOs, they are expensive to build per frame, and are typically built when model data is loaded from the hard drive. From what I've seen in tutorials, every object needed in the scene/level is built into a VBO upon level initialization. While this works fine for simple tutorials, what if I have a huge level, or a world that needs to stream in real time from the hard drive? (Eg, Skyrim, Burnout Paradise, Just Cause 2, Far Cry 2 & 3)

 

Obliviously, I will need to create new VBOs at some point. But when? I can't change the VBO per frame without significant performance loss can I?

 

One solution I can think of is a grid system. Moving beyond the edge of a grid causes a rebuild of the current VBOs to match the new player area. While this works in the worst case, I feel that general usage won't produce very many worst case scenarios. Thus this would lead to unnecessary performance loss.

 

Would a Dynamic VBO be part of a solution? If so, wouldn't that mean I either have to combine all the objects into one VBO, or, carefully keep track of which VBOs I can reuse?



Sponsor:

#2 MarkS   Prime Members   -  Reputation: 878

Like
0Likes
Like

Posted 05 July 2013 - 10:33 PM

For large world, like Skyrim, the world is diced up into sections. Only sections that are within the view frustum are sent to the GPU. Moreover, games that have large worlds tend to use an LOD system so that the vertex count remains more or less static when viewing objects/terrain at large distances.

 

Essentially, you'll need to implement some sort of scene management system. You should only be processing what can potentially be seen. The tutorials you are going to find and have found are geared towards showing you how to set up and process a single VBO. Basically, the tutorial is setting up a scene of one object and that object exists during the entire duration of the app. There is no scene management in this case, so the techniques presented will be different from that of a full production game.

 

I have yet to see a tutorial on scene management. It is a complicated subject and very much dependent on the scene structure in your game and doesn't generalize well.



#3 Labrasones   Members   -  Reputation: 337

Like
0Likes
Like

Posted 05 July 2013 - 10:52 PM

You said that sections within the view frustum are sent to the GPU. Binding a VBO and adding data to it is the process through which data is sent to the GPU in OpenGL right? (I want to make sure I know what a VBO is before I try and design a system to use them effectively. Have to leave the tutorials behind somewhere.)

 

When it comes to actually changing VBO data, is there any reason to try and maintain them? Or should I just tell OpenGL I don't need them any more and create new ones?

 

I'm also curious to know if the combined VBO method has any merit. I've seen a lot of posts suggesting that as many models as possible should be joined in a single VBO. I feel that this would be more effective if I wasn't doing matrix transformation during the vertex shader (which I plan to). I also think that it would remove the possibility of instancing any of the models within that VBO.



#4 Sik_the_hedgehog   Crossbones+   -  Reputation: 1536

Like
0Likes
Like

Posted 06 July 2013 - 08:11 AM

Matrix transformations with VBOs like that shouldn't matter at all because you just tell OpenGL which range of vertices to use to render (it just ignores the ones not used by the mesh in question). I don't remember how instancing works in that sense, but I believe it's the same.


Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.

#5 Labrasones   Members   -  Reputation: 337

Like
0Likes
Like

Posted 06 July 2013 - 08:45 AM

The reason I'm not sure about the matrix transformations is that I thought the whole idea of combining them was so you could make one draw call, which reduces api call time. I'm not sure how I would send a matrix or a position for all of the objects as well as the data.

 

How I imagine using a VBO is when I load model data from the hard drive, I create a VAO, then create and bind a vbo for the data (as most tutorials do). According to my understanding, that this allows me to bind the VAO instead of the VBO, which gives better performance(?).



#6 Chananya Freiman   Members   -  Reputation: 140

Like
1Likes
Like

Posted 06 July 2013 - 12:11 PM

There are versions of the instanced drawing functions that take a range of vertices.

 

The general idea (I believe) is to cache static objects together, ones that you know will never move anyway. But, this also hinders you with culling them. I assume google can give more information, I never had to handle big scenes as of yet.

 

Sending positions, transformations, etc. to batches (and instanced draws) is really an application-specific thing, but you'll usually use something to identify each mesh (done for you in instanced rendering), and based on that select the correct data from a uniform buffer / texture buffer / whatever.

 

VAOs are mainly for convenience, I am not sure if they actually improve performance, and if so it's probably by a little. They are used to store the current state of your context related to rendering (so vertex and element VBO bindings, shader attribute bindinds, etc.), so that when you want to draw something you need to only bind the VAO and it binds the context in it for you.



#7 Labrasones   Members   -  Reputation: 337

Like
0Likes
Like

Posted 06 July 2013 - 12:47 PM

VAOs are mainly for convenience, I am not sure if they actually improve performance, and if so it's probably by a little. They are used to store the current state of your context related to rendering (so vertex and element VBO bindings, shader attribute bindinds, etc.), so that when you want to draw something you need to only bind the VAO and it binds the context in it for you.

 

So, what if I have the same Shader and camera matrix for all objects, could I use a VAO to bind those two things, then manually bind each of my vbos after? It would make things much simpler. But wouldn't that then bind all those VBOs to the VAO state?



#8 Chananya Freiman   Members   -  Reputation: 140

Like
0Likes
Like

Posted 06 July 2013 - 01:21 PM

Shaders and uniforms are not bound by VAOs.

I suggest you to read this page, it will explain the subject much better than I could.



#9 Vilem Otte   Crossbones+   -  Reputation: 1366

Like
1Likes
Like

Posted 06 July 2013 - 03:15 PM

Ad scene management and the stuff...

You're mixing two different things together. First of all is data management - e.g. deciding which data you need and which you don't. All textures, meshes, etc. should be in VRAM, so you can render them ... sometimes you don't use VRAM (using software renderer F.e.) ... sometimes you need them in both VRAM and RAM (rendering using GPU, doing some other magic on CPU) - so let's generally refer to this as "data in memory". Second of all is culling in general (deciding which object to draw and which don't).

Let's start with culling - because that one is simple and well defined.

Let's have a scene consisiting of objects. I'll discuss just static scenes (e.g. scenes containing static objects only), but it's easy to extend for dynamic objects too. There is a bunch of known algorithms to determine which objects are visible and must be rendered -> we basically use two types of culling - frustum culling (detecting whether object collides with camera view volume) and occlusion culling (detection whether the object is hidden by objects standing in front of it).

For ease of scene management in this case we use some kind of hierarchy where the objects will be stored, the most common are either spatial hierarchies (octrees, kd-trees, grids, nested grids, ...) or object hierarchies (bounding volume hierarchies, bounding interval hierarchies, ...) ... both of these have advantages and disadvantages, but using any hierarchy is basically always better than brute forcing. In worst case you can also hand place portals into your map, but it's bad - artists need to do this work and they will do it wrong, really.

Frustum culling is simple detection of whether object's bounding volume intersects or is contained by the camera view volume (e.g. the frustum), this is where hierarchy comes in - if some hierarchy node isn't intersecting or contained, all it's children aren't visible by camera. Otherwise we check the current node children in hierarchy.

Occlusion culling is hell complicated. It also isn't worth it for lots of cases (especially if you do some very good Level-of-Detail). One of the recent and useful for dynamic and large outdoor scenes is Hierarchical Z-buffer occlusion culling, because you don't need to precompute any data for it, all can be computed on-the-fly. For static areas, you probably want Pontential Visibility Set or PVS (it's used in games using BSP maps), basically for each node in the map you compute a set of other visible nodes, this computation is done numerically using ray casting and it's not 100% precise (for very-little nodes next to very large nodes the results might be incorrect ... so you have to use more rays = more samples and PVS computation takes longer time), but it is clearly visible that it works (F.e. Half Life 2 uses it).

Basically you want to put this into nice Scene class and just call GetListOfVisibleObjects method with camera parameter (or equivalent in other programming paradigms).

I mentioned Level-of-Detail and we slowly can move towards data management because this one is between data management and rendering optimizations. Basically you have an object, let's imagine a stone - consisting of 10 000 triangles (e.g. it's good stone). You also want the versions with 1 000 and 100 triangles so you create them in your favourite modelling application (or if you're too much of a programmer, you generate the lower detail versions). Now let's say we have this object in the middle of our world. If we're on the edge of the world, we don't need to render the 10 000 triangles version, we just render the 100 triangles version (we don't even have the higher-detail versions in memory at all). As we get closer to stone, we load better-detailed object and throw away the low-detail version ... and so on (if we get further we just need lower-detail version, if we get closer we just need the better or the highest detail version).
This saves us both, memory and the computing power. Note that in reality we most likely will need to hold the currently needed level of detail and lower levels of detail as well, because the object (like stone) will most likely be on more locations in a world and the lower detail version will also be visible.

With just level of detail, you can achieve pretty large worlds, basically any better data management of the scene is sort of level of detail technique (where F.e. at some point you throw some data off memory and dont render the object at all).

Now let's jump ahead to full data management. Imagine a HUGE world like F.e. Skyrim has. Let's make our world 10x10 kilometers. First of all 10x10 kilometers with 1 pixel in heightmap per meter, we would have 100 000 000 pixels of data only in height map, thats approx. 381.5 MB (if we have 32-bit floating point describing single value in height map).
Even if we can fit 381.5MB of data in our VRAM, we don't want to spare that much just for terrain. And also, even if we're in the center of the map, the edges are 7.071 kilometers away and we don't need 1 vertex per meter precision at that distance, it'll eat more memory than needed, not even mentioning other troubles (antialiasing?).
So we divide our world to lets say 100x100 meters (thats 0.1x0.1 km) squares, we now have 100x100 squares. For NxN nearest squares we need high quality e.g. the 1 pixel per meter (where N is F.e. 5 - e.g. some good small number - which needs 0.953MB - which is more than acceptable), for the rest of the tiles we can live with 1 pixel per 10 meters (for whole world this gives us 3.82 MB - which is acceptable). So now we can fit our world terrain height map into some 4.77 MB, which is a lot better.
Second optimization, for further terrain we don't need to have high quality models of trees, castles, bridges, etc. - we can use just low detail versions and imposers - e.g. just a single billboarded quad (these works very good for trees - Oblivion used them and Skyrim too). So we don't need to have the geometry or textures of high quality objects in memory. Of course we must load them when we load the high quality square.

If there would be enough interest I might even put an article (or articles, as this is quite big topic) on this topic together (as especially the first has quite a lot in common with my work). smile.png

 

EDIT: In the end I decided to put a little effort into this and write actual article (or maybe articles) on optimization of 3D rendering. It might take a while, but I think I'll manage to throw out few useful articles.


Edited by Vilem Otte, 07 July 2013 - 08:01 PM.

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com


#10 marcClintDion   Members   -  Reputation: 431

Like
1Likes
Like

Posted 06 July 2013 - 05:10 PM

It sounds like you might want to implement a distance check to delete and create VBO's and texture data as the you move around the scene.  If a model will soon be visible, then run an initialization function for that model.  If it's not going to be visible anymore then delete it.  You might have some hick-ups when you move from one area to another.

I certainly would not want to do this every frame for every component.

 

So far as reusing VBO's that have already been created, I'm not sure this would offer you any benefit.  The VBO that is being reused would have to be resized and rewritten, it seems to me that just creating a new one would be the way to go.   I don't imagine that reformatting an existing VBO would offer any performance advantage over simply creating one, it seems like you'd have the same overhead either way.   Then again, I've never tried this, maybe it would. 


Consider it pure joy, my brothers and sisters, whenever you face trials of many kinds, because you know that the testing of your faith produces perseverance. Let perseverance finish its work so that you may be mature and complete, not lacking anything.


#11 marcClintDion   Members   -  Reputation: 431

Like
0Likes
Like

Posted 07 July 2013 - 01:22 AM

I'm trying to understand more about how Vertex Buffer Objects work

 

I thought that this little part needs to be addressed as well since you mentioned it.  If you were to initialized a vertex as follows with normals and texCoords, then this "model" would definitely be stored in the main system RAM, not in the GPU RAM. 

 

GLfloat modelName[] = {1.0, 1.0, 1.0, 1.0, 1.0 , 1.0, 1.0, 1.0};

 

Now if you were to assign this vertex to a VBO then this model would be sent to the GPU RAM, if the GPU supports this.  Otherwise, if the GPU does not support VBO's then it will still go to system RAM.  These days it's safe to say that it will go to the GPU and as a consequence can be processed much faster.


Consider it pure joy, my brothers and sisters, whenever you face trials of many kinds, because you know that the testing of your faith produces perseverance. Let perseverance finish its work so that you may be mature and complete, not lacking anything.


#12 Labrasones   Members   -  Reputation: 337

Like
0Likes
Like

Posted 11 July 2013 - 11:40 PM

Now if you were to assign this vertex to a VBO then this model would be sent to the GPU RAM, if the GPU supports this.  Otherwise, if the GPU does not support VBO's then it will still go to system RAM.  These days it's safe to say that it will go to the GPU and as a consequence can be processed much faster.

 

 

Does this mean that if there is not enough VRAM, the vbo will be stored on ram instead? Or will it just throw a memory error and not store it?



#13 marcClintDion   Members   -  Reputation: 431

Like
0Likes
Like

Posted 12 July 2013 - 05:07 AM

Does this mean that if there is not enough VRAM, the vbo will be stored on ram instead? Or will it just throw a memory error and not store it?

So far as any guess that I have concerning a split memory scheme goes, I would say 'yes', most likely. This would be entirely up to the GPU driver capabilities.  If the driver is written to do so and if the driver is stable on that machine, why not.  If the driver can do one or the other, then why not both at the same time? 

 

I can foresee some issues with this.  How does the driver know what to put where?  What if all the irrelevant models are on GPU RAM and all the currently displayed models are all in main system memory?  This would not be optimal.

 

I don't think that a memory error would be thrown unless you decide to implement this yourself, or if a library that you are using does this to you. 

You might want to do this for mobile devices which may have memory restrictions, but for conventional machines you will likely just end up forcing the OS to swap out RAM onto the hard-drive into virtual memory space. 

 

For desktops and laptops, you not only have GPU RAM(500MB?) and system RAM(2GB?) but virtual memory on the hard-drive as well(2GB?),

You can get away with loading a whole heck of a lot of stuff, but the problem is not running out of memory so much as all the swapping that will have to take place behind the scenes.  Swapping between CPU and GPU isn't so bad, swapping from CPU to HDD is going to lock up your OS for some time.  Hopefully not a long time.

 

It may be best, as a start,  to script what is supposed to show when the character is at a specific point.  This requires the least memory but will require hard-drive access during game-play.  Hard-drive access is very slow, for a long time the common number being passed around for how much slower the hard-drive is was 200x slower then the 'system-bus-thing-a-ma-bobby'. 

 

If I were to put any logic into loading/unloading I would start with the following. 

I would only load models from the HDD when the frame rate is high and I'd leave things as they are when the frame-time drops due to other reasons.  Instead of loading everything from the next scene all at once, I'd load pieces of the new scene at times when the system is running quickly.  This has the potential to minimize hick-ups in the game-play due to asset loading. 

 

Dropping assets from memory should be quicker than loading unless you happen to be saving them for some reason, you could do this almost anytime.


Consider it pure joy, my brothers and sisters, whenever you face trials of many kinds, because you know that the testing of your faith produces perseverance. Let perseverance finish its work so that you may be mature and complete, not lacking anything.


#14 Chananya Freiman   Members   -  Reputation: 140

Like
0Likes
Like

Posted 12 July 2013 - 06:55 AM

Regarding the above comment.

 

Drivers are free to swap your buffers from RAM to video RAM (VRAM has two meanings in this context) and the other way around at any point of time.

While giving them hints as to the usage of buffers might (and again, might not) make them initialize the buffer where you want, drivers are free to do real-time analysis as to the actual usage of your buffers, and swap them if they want to, and in fact I read years ago that they indeed do this.


Edited by Chananya Freiman, 12 July 2013 - 07:03 AM.


#15 Sik_the_hedgehog   Crossbones+   -  Reputation: 1536

Like
0Likes
Like

Posted 12 July 2013 - 01:55 PM

OpenGL never specifies where buffers are stored, this is to let drivers decide what's the best thing to do. The upside is that if hardware changes the drivers will adapt accordingly without the program having to do anything, the downside is that many programmers don't like losing control over that (even though the driver should know better than them!). As said in the post above, you can give hints that tell the driver how you plan to use it and then the driver decides where to store it.

 

So, back to your question (whether it'd throw out an error or not), that depends on whether the driver finds acceptable leaving the VBO in RAM or not. OpenGL allows them to do this, but they don't have to. I don't know how current drivers handle this.


Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.

#16 Labrasones   Members   -  Reputation: 337

Like
0Likes
Like

Posted 12 July 2013 - 07:25 PM

OpenGL never specifies where buffers are stored, this is to let drivers decide what's the best thing to do. The upside is that if hardware changes the drivers will adapt accordingly without the program having to do anything.


So, I could create a VBO for every model upon game load (assuming it totals less than system RAM size)?

The nature of my original concern was doubling the memory requirements of the game, since I'd otherwise be storing the model data outside the VBO as well as within one during rendering.

Ideally, I'd like to load the model data for all the objects within the player area, directly creating VBO's for those objects whether or not they are visible.

However....
 

As said in the post above, you can give hints that tell the driver how you plan to use it and then the driver decides where to store it.


How does one go about giving hints to the driver? I'm feeling like that would be vital to good performance.



#17 Radikalizm   Crossbones+   -  Reputation: 2792

Like
0Likes
Like

Posted 12 July 2013 - 07:58 PM

OpenGL never specifies where buffers are stored, this is to let drivers decide what's the best thing to do. The upside is that if hardware changes the drivers will adapt accordingly without the program having to do anything.


So, I could create a VBO for every model upon game load (assuming it totals less than system RAM size)?


Even if you allocate more than the available RAM size, the operating system can handle those kind of cases. It's going to take a while before you'll actually fill up your entire physical memory though.

How does one go about giving hints to the driver? I'm feeling like that would be vital to good performance.


I believe OpenGL provides a usage flag in the glBufferData function which gives the driver hints about how the buffer is going to be used.

I gets all your texture budgets!


#18 dpadam450   Members   -  Reputation: 856

Like
0Likes
Like

Posted 12 July 2013 - 11:22 PM

Replying to the original post. You will never fill up a graphics card with just Vertex data in a typical game. Take any game scene and there really is not much geometry, it is a lot of the same VBO being used over and over again. Trees, enemy models, gun models. Even terrain can be based off a single VBO quad that is split into a triangle grid and then instance that VBO all over your terrain all next to each other in a grid fashion (using a heightmap to displace each individual patch).

If you are this early on, optimization should be the last thing to worry about especially since GFX cards are in 1 to 2 GB range. Try and look at some wireframes of games if you can find them



#19 Labrasones   Members   -  Reputation: 337

Like
0Likes
Like

Posted 13 July 2013 - 09:49 AM

Replying to the original post. You will never fill up a graphics card with just Vertex data in a typical game.

It was always more of a hypothetical question. Questions like this which cover extreme situations help me understand what is going on behind the scenes. It's not just about optimization, it's about understanding how it works. If I can understand how it works, I'm less likely to use it wrong.






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS