Best practices for packing multiple objects into buffers

Started by
16 comments, last by NathanRidley 9 years, 1 month ago
I'm happy to only support the cutting edge. I don't expect to release anything for at least a year or two at minimum, and given that today's cutting edge is tomorrow's standard, I think I'm relatively safe.

As said: You cannot develop for Mac with that requirement now, and to be honest, I don't expect Apple to release OpenGL 4.4+ this year or the next. Maybe they never will, who knows. They could have done it with Yosemite, but they have not; instead they introduced Metal. I develop on Mac, too, and one of my principles is to encapsulate GPU resource management to a maximum for this kind of reason.

Advertisement

What would be the best way to update something like terrain meshes, where you usually divide them into "chunks" that can be culled, but they could easily sit in a one big buffer. The problem is updating this buffer - if chunks change (some removed, some added, and with dynamic terrain geometry even the same chunk can be rebuilt with different amount of vertices) you need to change the buffer and for me this means reuploading all terrain data that's valid at a given point. Is there a way to somehow exchange data chunks of different size without reallocating buffer?

I was thinking about maybe aligning buffer sizes to some common "fit all" size, and wasting some storage (for padding) but gaining ability to easily remove single chunks, because they will all occupy the same size in buffer. Maybe I'm missing something, but for now I'm using single VBO for each chunk and thinking how this could be done better.

Reuploading whole buffer when just one chunk changes seems way too expensive and there surely are ways to do it better.


Where are we and when are we and who are we?
How many people in how many places at how many times?

OS X and advanced game development is a huge PITA regardless of the year; they're always far behind and it's buggy.

There are no announcements on whether they plan to support more modern OpenGL versions, one can only hope...

It may be noted that you should (that's up to you) still support a path using GL_UNSYNCHRONIZED_BIT instead of persistent mapping (they diverge only very slightly if you design your code well). Only D3D11 hardware supports it; and its market share is still too low. It depends on how you think it will grow by the time you release your game.

Cheers

Matias

It may be noted that you should (that's up to you) still support a path using GL_UNSYNCHRONIZED_BIT instead of persistent mapping (they diverge only very slightly if you design your code well). Only D3D11 hardware supports it; and its market share is still too low.

Doesn't GLs "unsynchronized" flag correspond to D3Ds "no overwrite" flag? If so, shouldn't it work on all HW as that flags been present since D3D9?
Or wait, did you mean the persistent flag is only supported on new HW...

Doesn't GLs "unsynchronized" flag correspond to D3Ds "no overwrite" flag? If so, shouldn't it work on all HW as that flags been present since D3D9?
Or wait, did you mean the persistent flag is only supported on new HW...

Yes and yes. I meant that GLs "unsynchronized" flag can be used on D3D10 hardware as a fallback, while persistent mapping can be used on D3D11 hardware.

Since my previous post went unanswered I will try once more, maybe stating what my problem is in a more clear way.

That said, IMHO handling terrain chunks and batching meshes of multiple objects are two distinct use cases. They are so distinct that both require their own solution. I would generate a single vertex buffer for terrain, replacing regions of content when necessary. Why? Because rendering terrain will happen also in situations where more than a single chunk is visible at a time. If you would use multiple buffer objects, then you would switch buffers even during a single rendering pass of terrain. Using multiple memory blocks would mean to duplicate chunks in GPU memory. On the other hand buffer memory management would be relatively easy, assuming that the memory footprint of chunks is fix.

What about terrain chunks whose size is not fixed? I have modifiable terrain (mesh is extracted from 3D binary volume) so size may differ greatly based on how fragmented the terrain is - it can be flat hill with lowest amount of vertices, but also can be totally rough and fractured with much bigger mesh surface and thus vertices amount. So far I used single VBO per chunk, but I plan on having really big draw distance and it starts to hurt especially that chunks go in 3 dimensions and there can be several hundreds of highest LOD chunks (of course more LOD levels and concatenating chunks into bigger ones will also be implemented but still I think there is a lot of VAO switching that could be avoided).

I'm trying to figure out some custom allocator that would manage a single VBO (or few big ones, depending on total size of terrain geometry), but I can't figure out how to modify chunks that were changed, and also stream new chunks in/out as player moves. With a buffer of fixed size with N slots it would be easy - just swap old chunk with new chunk (whether new one is because player moved or digged existing chunk, doesn't matter), but I can't figure it out for dynamic chunks.

My initial idea was to allocate slightly more than needed - check what's the max size of chunk that exists now, allocate maybe 1.5x as much so all chunks will be padded and this will allow for them to grow or shrink without affecting other chunks in the memory. Problems start if there is situation where chunks grow beyond fixed size - what to do then? It also wastes a bit of memory of course, but not sure if that would be any significant amount.

Another idea would be to not pad to fixed size, but just allocate big enough buffer and put chunks one by one in linear way. If chunk that already exists changes it's geometry, I check if it's the same size or smaller - if so I just overwrite that chunk data. This will leave some portion of buffer orphaned, and it will probably be so small chunk of memory that it will be wasted forever. This can lead to buffer fragmentation at some point and also requires some fancy allocator that will keep track of memory use and find a place for new chunk to occupy if it exceeds it's current location space.


Where are we and when are we and who are we?
How many people in how many places at how many times?

Little late to this topic but....

First off since you mention you are learning, do yourself a favor and if you really want to learn opengl, get yourself in a situation where you can limit the frustration of having to do setup code, and provide yourself with an environment that can help you debug. THIS IS NOT MAC OSX. Ubuntu or Windows + Clion or VS + GDEbugger + common support libraries like Glew, GLFW and SOIL help you get a window and load textures quickly (just a hypothetical example)

Also, as has been suggested before, there is very much an old way and a new way of doing things in OpenGL, and they have big implications on performance.

Luckily, An Nvidia guy has a github page that has sample code for the new way of doing things. I highly suggest you clone that repo and study it religiously.

That github page, paired with study of the opengl spec that describes the features in that code, will IMO be your best bet at learning opengl.

In a nutshell... the movement in the OpenGL API has been that of 0 driver overhead. That is, the application does some of what the driver usually does and thus the driver does less work, so you can really slim down the runtime graphics stuff.

Lastly, in the repo is a solution to packing multiple textured quads using (might not have the right names) ARB_BUFFER_STORAGE, SHADER_DRAW_PARAMETERS, BINDLESS_TEXTURE_ARB, SPARSE_TEXTURE_ARB, and MULTI_DRAW_INDIRECT. Again, the application is doing what the driver might do, thus reducing overhead. If you think of a draw call as packing data of similar objects as opposed to actually drawing a single object, you begin to see how the transition of old opengl to modern opengl is themed.

------------------------------

redwoodpixel.com

Luckily, An Nvidia guy has a github page that has sample code for the new way of doing things. I highly suggest you clone that repo and study it religiously.

That github page, paired with study of the opengl spec that describes the features in that code, will IMO be your best bet at learning opengl.


Thanks, will do!

In a nutshell... the movement in the OpenGL API has been that of 0 driver overhead. That is, the application does some of what the driver usually does and thus the driver does less work, so you can really slim down the runtime graphics stuff.


I'm quite keen on learning to use Vulkan once it's out, though I know I'll be waiting a while for that to be release, be stable and have some minimum level of platform support.

This topic is closed to new replies.

Advertisement