Need clarity on how to implement a spritebatch

Started by
3 comments, last by Darkilon 8 years, 1 month ago

I'm programming in C++ with OpenGL, but decided to post here since this is more of a technical, generic question; I'm not asking how to implement one in OpenGL but rather the idea behind it.
Essentially, I'm batching quads using instancing and a texture 2D array which contains the textures of the quads.
Now, my problem lies on how to effectively batch them, I'll explain myself better: previously, to batch quads, I was simply adding their instance data into a VBO, load it all into GPU and render the quads with a single rendering call, after which I would empty the VBO and start a-new (this time with updated quads' data).
However, I found that to be slow, so I'm currently doing this: a new quad needs to be rendered, so I place its instance data into the VBO, which's going to stay there for as long as the quad needs to be rendered and, whenever it changes position, size or whatever, I'll update its current data into the VBO, overwriting its old data.
Is this a good way? Or is there a better, more efficient way? I'd like to hear your thoughts on this.
Also, how does XNA achieve its spritebatching? I read about it around and it seems to enclose its draw calls into some begin() and end() functions which respectively set up and flush the renderer. Does that mean that it acts like my previous batching method, albeit in a better, more performant way?
Thanks to anyone who can shed some light upon this.

Advertisement
In any case, you need to be using double- or triple- buffering (whatever it takes to ensure that you are not waiting on resources that are in-use by the GPU).

Instancing doesn’t make sense for extremely small sets of data.
You are better off either just filling the buffer in directly via the CPU or using the technique described here.

And, in any case, you don’t need to update the buffer at all unless the data has changed. It tends to be a good idea to have separate buffers for background objects and dynamic moving objects.

I read about it around and it seems to enclose its draw calls into some begin() and end() functions which respectively set up and flush the renderer. Does that mean that it acts like my previous batching method, albeit in a better, more performant way?

Beginning and ending a sprite batch at the very least sets and optionally restores render states for typical 2D drawing, but you can ignore this as you should already been in such a state and there is overhead for switching.
Other than that it begins a batch, likely by getting an unused internal vertex buffer, filling it progressively for each draw call you make, and finally finishes off and renders the buffer all at once when you finish the batch.
This is implementation-dependent, so there may not be a single best answer, but it is likely along those lines.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

Excuse me, I forgot to mention that I AM triple-buffering.

Also, probably it wasn't clear, but I'm updating the VBO ONLY when the data changes.


Instancing doesn’t make sense for extremely small sets of data.
You are better off either just filling the buffer in directly via the CPU or using the technique described here.


Mind explaining me why? I'm only ever going to render the same type of object (a quad) whose copies are going to have different textures, positions and whatnot, so I thought instancing would be ideal in this case, without even mentioning the ease at which all the quads can be rendered (a single draw call).

Thanks for the file, I'll make sure to read it later.

Sigh. The crappy forum software here combined with my crappy laptop keyboard just lost a _lot_ of good information (wtf a site in 2016 is still losing all your input if you accidentally navigate off the page?).

Short version:

1) stream to a single VBO instead of double or triple buffering. For each batch, write to a non-overlapping region. Reduces the number of buffers you need to bind and reduces the complicated resource barrier tracking that driver have to do. If you have the address space to spare (hint: you do, even on 32-bit) then use persistently-mapped buffers. If your API is new enough you can use fences to stall when you need to wrap the VBO or just remap with discard when that happens if your API is old and crufty

2a) write individual vertices to your VBO and avoid instancing. Instancing has an overhead per-instance, so you should only use it when that overhead is smaller than the overhead of writing individual vertices into your streaming VBO. For quads, it's very unlikely that your driver's instancing overhead will be small enough to be worthwhile.

2b) use vertex fetch to avoid both the instancing overhead and the vertex overhead. Fill in your streaming VBO just like you would for instancing but then draw using a non-instanced draw call with no attribute VBO bound. Draw 6*instance_count vertices. This is fully supported as far back as DX9. Your shader now takes the vertex id and divides that by 6 giving you the index into the transform buffer, and can use the vertex id modulu 6 to calculate the quad vertex data. The small bit of extra work in your vertex shader will be negligible.

Sean Middleditch – Game Systems Engineer – Join my team!

Sorry that you wasted time by typing something that got lost!
I, again, forgot to mention (that's what you get when you post something half-asleep, I guess) that I'm using a persistently-mapped buffer (I had read this) as well.
Anyway, thank you both for your suggestions, I'll try to work on them and profile.

This topic is closed to new replies.

Advertisement