[quote name='dpadam450' timestamp='1339870022' post='4949847']
That makes sense for say in BF/MW you have a bunch of houses/huts and you want to just put all the chairs/tables/cups/lamps into 1 vbo and just call it "Room1 VBO" but I wouldn't dare stick 2 full houses into a VBO when I can only go into 1 house at a time, and view 1 room in that house at a time.
yes, every house could be in a VBO... unless it had lots of details inside (vertices), in which case it would be distributed over multiple VBOs
You're both a bit off track with VBO allocation. Creating a VBO is equivalent to calling [font=courier new,courier,monospace]malloc[/font] in CPU land -
it just gives you an array of bytes. Depending on your flags/hints, it will either [font=courier new,courier,monospace]malloc[/font] some VRAM, some main-RAM, or both (
and as before, may have to use a lot of RAM in the dynamic case). The GPU may be able to read from both VRAM
and main-RAM (
remember, in order to cover the latency caused by slow memory-read times, you need more GPU registers to hold more threads), but the driver hides the details.
I can malloc enough memory to store two "Rooms" worth of vertices, and copy both those data sets into different parts of this same allocation, and then either render the two rooms in one go, or draw them as two separate rooms. It's exactly the same as if I called malloc twice - once for each Room. The only difference is that with the single-malloc version, I've got the
option of making draw-calls that use verts from both rooms at once.
When you load a graphics asset, the file usually contains a blob of bytes that need to end up in main-RAM, and a blob of bytes that need to end up in VRAM. The simple solution is to create a new VRAM buffer for each asset (e.g. new model = new VBO), but this is basically just a detail of your engine's VRAM management system. You could instead make a few big VRAM allocations, and dish out different regions/offsets to different assets. There's a lot of ways to build this part of an engine.
If you're following the "typical" every-object-is-a-draw-call technique, then it can be even more important to use a smaller number of (
shared, large) buffers, because then you won't have to issue buffer-binding state-change commands in-between each draw-call! Remember that draw-calls specify an offset into the VBO (or IBO), so many different bits of data can be grouped.
Also, your vertex attribute bindings (
aka vertex declaration - the thing that plugs your VBOs into your vertex shader) are also largely based on offsets into your VBO's, allowing a lot of flexibility (
exact details depend on the API).
E.g. sometimes you want to separate out your positions and normals into 2 separate (
non-interleaved) streams -- they can still actually be stored within the same VBO if you like (
not saying you should), with all positions first, followed by all normals after. Just use the right buffer-offset when binding the normal attribute.
Or, you could also have a VBO that contains interleaved data, but bind your VBO twice in such a way where the GPU is still reading two separate streams as if you would with 2 VBO's (
the 2nd stream is just offset sizeof(position) from the 1st stream).
To go off on a bit of a tangent -- when different models share vertex data, such as a low-LOD model and it's original high-poly model, then they can share buffered vertices too. In this case, the low-LOD model could be created using only the high-poly model's vertices as input, and generating a new set of draw-calls and indices (
which could be stored in any particular IBO object) that reference the original high-poly vertices.
So assuming I've already got the local-space high-poly model in VRAM, then drawing a crowd of LODed versions has very minimal memory impact. I've only got to load the LOD index data and the crowd transform matrix array. Each model in the crowd can be a unique variation on the model, in quite an extreme way with the specs in the OP. A model of a sphere can be tessellated and displaced into almost anything, so a T-pose of a human could be morphed into almost all your humanoid characters (by modelling them from the human base mesh).
Or back to the House example -- you could have a set of indices which draws the exterior of each house and you could also have a set of indices which draws in interior of each If you put the two 'exterior' index lists next to each other in the IBO, then you can draw the exteriors of either house individually, or draw both houses together in a single draw-call.
You can actually pre-generated multiple index lists for different viewpoints, or for likely pairs of objects (
e.g. one list for when A,B,C are visible, and one for when D,B,E are visible). You can get much better results on individual models with many layers of transparency (
i.e. need back-to-front order), such as foliage or glass-heavy designs, if you pre-generate a few different index lists for different angles. At runtime, you just need to measure the viewing angle and look-up the right IB offset.
I actually store 'draw-call' objects inside my model files, which may cover the same area as other draw-calls (
i.e. [font=courier new,courier,monospace]foreach( model.draw as draw ) draw();[/font] would make a mess). Using a bit of meta-data or game-specific logic though, you can do things like separate the draw-calls out into LOD layers, etc...
[/quote]
I very much appreciate your helpful attitude and thoughtful posts.
I understand most of your comments about the VBO. Maybe you can critique what I do. I don't do graphics full time, and I started planning this engine circa GTX6800 era, so I think some of my design decisions were based upon outdated assumptions. First of all, would you say my emphasis on 65536 vertex VBOs is more-or-less obsolete at this point in GPU history (to keep the indices at 16-bits in the IBO)? The vertices are typically 64 bytes.
What I do is allocate VBOs in the GPU, then manage what is in them myself. Thus, once a game or simulation has more-or-less gotten going, my engine doesn't often create or destroy any VBO. What it does do is move objects from VBO to VBO once in a while. This is not a big overhead, since most objects don't move and thus stay in the save VBO forever. Since my VBOs hold objects in the same 3D volume of space (nominally a cube), when that 3D volume becomes very far away from any active camera, the VBO is not deleted, but instead will eventually be filled with objects in some other 3D volume that is being created or coming into view of a camera.
My current approach "accidentally" has one other convenient feature. The IBO for each VBO contains multiple sets of indices, one set per LOD level. So as the 3D volume moves further from a given camera, the engine simply draws via a different set of indices to render all the objects at the appropriate LOD level. The other advantage of my approach is this: GPU memory does not fragment, at least not GPU memory allocated to VBOs and IBOs. If what I'm learning here convinces me to switch to some other scheme in which the GPU holds local-coordinates for all objects, my practice of assigning VBOs to 3D volumes probably isn't a wise choice, and I may also need to implement my LOD strategy somewhat differently.
This "clean" way of allocating VBOs might be difficult to keep if I switch over too. I'll have to think about that. I don't want to cause a slow but never-ending loss of GPU memory due to fragmentation. The idea of having 2 or more VAOs for each VBO is interesting. I'll have the think about that idea further too. Currently the contents of all my VBOs are interleaved.
Can you point me at any reference that explicitly says nvidia GPUs can be executing multiple batches with different shaders and uniforms and vertex specifications simultaneously? I am surprised to learn that I could miss such an important piece of information - even though I'm not doing 3D all the time. I was just reading "game engine architecture" last night and that book implied what I thought - that all cores of the GPU were executing the same shader on the same object. Or better yet, can you point me at some book or articles or whitepapers or whatever that explains the capabilities of GTX680-generation GPUs, and next-generation GPUs too if possible, with an emphasis on nvidia GPUs if possible. I've always been annoyed that I was not able to find any coherent presentation of this type for GPUs. The closest I ever found was a 20 or 30 odd page PDF file about the GT8800 series. But even that was very scant on details or explicit statements, and I'm worried the architecture has changed significantly since then, especially in these sorts of ways (becoming more flexible).