Pros and Cons for Batching Sprites

Started by
8 comments, last by Vincent_M 10 years, 7 months ago

I thought sprite batches were the way to go for the last 3 years, but the more I use my sprite batch system, the more I wonder if it's actually helping? I understand the benefits: fewer texture swaps, shader swaps and draw calls overall.

The way I use my sprite batch class is to accomplish what I described above. I have a Sprite class which I create instances of, and add them to my SpriteBatch instance which also has a reference to a texture atlas. I can setup each sprite's blitting data, transform them, etc. and the Sprite Batch will keep track of every sprite's vertices in a single array.

Life is good, isn't it? Then, I started analyzing the cons I've found using them:

-Most sprites aren't using the same texture all that frequently unless you're using a tile map, but tile maps can be handled as a special-case anyway. Outside of that, it's very rare to have the same sprite show up multiple times onscreen.

-Batching requires all sprites to software-transform the vertices they own in the vertex array in OpenGL ES 2.0 --very common right now for mobile games

-If you wanted to make a sprite invisible because you didn't want to render it at that time, you couldn't just simply not render it. You'd have to give that sprite's vertices an alpha value of zero so they're completely transparent. This is a blending performance killer on mobile devices

-Frustum culling is inefficient. If you use a sprite batch for a large area that has many of the same sprite, say, the same enemy, you'll be able to draw them all at once easily, but usually only a couple of them, if any, are onscreen at a time

-Color data is needed per vertex to apply a color overlay to a sprite instead of a single uniform because its vertices are batched with the rest

-Any time you want to draw a sprite, you need a sprite batch. This is a pain if you're making an RPG with 4 main characters where they each have their own batches. Each character needs its own sprite batch for their sprite even though they're the only instance that'll ever need it at a time

-Storing a sprite in multiple lists are a pain as well. I'm writing a 2D engine where each Sprite is a scene object with its own transform matrix. Not only does it have to be included in the scene's object dictionary (STL map), but now its corresponding sprite batch needs to keep a pointer to it in its own list. This is problematic because if the Sprite is released from memory, I now need to make sure both the batch and scene know it.

Just my thoughts

Advertisement

I'm currently having similar thoughts about implementing a sprite batcher, and your post even got me thinking about some problems I hadn't thought of before. As such, remember that all follows are answers from the top of my head that may or may not work as I have yet to implement this myself.


-Batching requires all sprites to software-transform the vertices they own in the vertex array in OpenGL ES 2.0 --very common right now for mobile games

If done at load time this shouldn't be a problem, no?

I'm assuming you only geometry-batch static sprites.


-If you wanted to make a sprite invisible because you didn't want to render it at that time, you couldn't just simply not render it. You'd have to give that sprite's vertices an alpha value of zero so they're completely transparent. This is a blending performance killer on mobile devices

Maybe increasing draw calls to only draw the visible indices would help. Once an object is invisible for more than a number of frames or more than X objects are hidden, rebuild the indices to only contain the visible instances. Having some pools of indices would help limiting the work done when rebuilding the indices.


-Frustum culling is inefficient. If you use a sprite batch for a large area that has many of the same sprite, say, the same enemy, you'll be able to draw them all at once easily, but usually only a couple of them, if any, are onscreen at a time

Try force-splitting the batches in chunks of a certain area. Say, a chunk only covers a 512x512 pixel area at most (only count the sprites' centers, not their dimensions).This will make even less sprites fit a batch, but maybe it's still worh it... This may however add to the difficulty of properly sorting the faces to render front-to-back (if you need that).


-Any time you want to draw a sprite, you need a sprite batch. This is a pain if you're making an RPG with 4 main characters where they each have their own batches. Each character needs its own sprite batch for their sprite even though they're the only instance that'll ever need it at a time

Assuming your engine is aware of the difference between static sprites and dynamic sprites, the performance loss of putting your character sprites through the system should be negligible.

The interface you use to feed the sprites to your engine should also be simple enough that the user will barely notice that the sprites will get batched before rendering.


-Storing a sprite in multiple lists are a pain as well. I'm writing a 2D engine where each Sprite is a scene object with its own transform matrix. Not only does it have to be included in the scene's object dictionary (STL map), but now its corresponding sprite batch needs to keep a pointer to it in its own list. This is problematic because if the Sprite is released from memory, I now need to make sure both the batch and scene know it.

There probably are many solutions to this problem, Maybe your Sprite class should be the one notifying the batcher of its existence and destruction in it's ctor and dtor respectively? Or maybe you could use something like boost::signals to expose some events on the sprite, something like sprite->getSignal_onDestroy().connect(...). Or implement a sprite::addListener( ISpriteListenet* ).

The solutions also depend on the type of scene; how many static sprites are there, how many dynamic ones, is instancing used etc.

-Most sprites aren't using the same texture all that frequently unless you're using a tile map, but tile maps can be handled as a special-case anyway. Outside of that, it's very rare to have the same sprite show up multiple times onscreen.

If performance is critical and your sprite usage is very dynamic (i.e. you can't create a texture atlas offline) you need to create a system that would take the textures being used, and batch them into a single texture atlas.
It's not easy, but it isn't that hard either.

If you're developing for DX11/GL3; then you can use texture arrays instead which is much simpler and solves the issue.

-Batching requires all sprites to software-transform the vertices they own in the vertex array in OpenGL ES 2.0 --very common right now for mobile games

And this is a problem because...?

-If you wanted to make a sprite invisible because you didn't want to render it at that time, you couldn't just simply not render it. You'd have to give that sprite's vertices an alpha value of zero so they're completely transparent. This is a blending performance killer on mobile devices

If you're manipulating the sprite vertices to change the alpha value, then:
  • You can just not include the vertices when regenerating the vertex buffer.
  • If you're overwriting just a subregion, then a more clever approach than setting alpha to 0 is to move the vertices out of the viewport region. All vertices will be culled and pixel processing power won't be wasted on them; only for the vertices.

-Frustum culling is inefficient. If you use a sprite batch for a large area that has many of the same sprite, say, the same enemy, you'll be able to draw them all at once easily, but usually only a couple of them, if any, are onscreen at a time

Yes. This is a trade off. Partition your batch into smaller batches, but not small enough to become one batch per sprite.
Additionally, if your API supports instancing (or you're passing each sprite position through a constant register) then culling is still possible. If you can draw up to 80 sprites but only 2 are visible, then divide the number of vertices by 40.
The fact that your vertex buffer can hold 80 * 4 vertices (assuming 4 vertices per sprite) doesn't mean you can't pass less to the draw call.
If you're using instancing, just pass a lower instance count.

-Any time you want to draw a sprite, you need a sprite batch. This is a pain if you're making an RPG with 4 main characters where they each have their own batches. Each character needs its own sprite batch for their sprite even though they're the only instance that'll ever need it at a time

I prefer a more generic batching system that accepts "any" kind of sprite and batches it together (generating the atlas on the fly) and then reuse. However this only works well if the sprites can be grouped by shared properties that last long enough.

Indeed, batching isn't a silver bullet, it does come with trade offs or problems. But generally speaking does it's job in improving performance (with some exceptions, when all the content is too dynamic), and some of the problems you're mentioning are easily solvable.

I only transform the vertices if something has changed since the last frame, which is nice if it's static. A downside to consider, however, is that each sprite requires as second set of position, rotation and scale data which can be bloating, and if even just one thing changes, you'd have to re-multiply the translation, rotation and scale matrices together to get the new transform matrix, then multiply pre-multiply its parent matrix by that if it's attached to another scene element, and finally multiply each vertex by that matrix. It can be memory-intensive too as my old sprite module's size was around 1500 bytes per sprite instance! It had all kinds of features though, but imagine using thousands of those instances as static sprites in a tile map. That's megabytes of just metadata lol... (That's not counting the actual vertex/index data in the batch).

I've thought about index pools as alpha blending combined with limited filtrate on mobile devices are huge restrictions. It'd just be bad to have to hide a 256x256 sprite with alpha blending from all the texture look-ups alone.

Btw, as far as sorting goes, I use the depth buffer along with 3D coordinates in my vertex format. I thought about trimming the vertex struct down to 2D positions with a "depth" value per sprite, but that won't work with batching since OpenGL ES 2.0 doesn't support UBOs like desktop OpenGL 3.3 lol. Due to this, color and depth information per sprite is copied into each of the sprite's vertices until we start seeing OpenGL ES 3.0 hardware.


Vincent_M, on 25 Aug 2013 - 5:30 PM, said:

-Any time you want to draw a sprite, you need a sprite batch. This is a pain if you're making an RPG with 4 main characters where they each have their own batches. Each character needs its own sprite batch for their sprite even though they're the only instance that'll ever need it at a time
Assuming your engine is aware of the difference between static sprites and dynamic sprites, the performance loss of putting your character sprites through the system should be negligible.
The interface you use to feed the sprites to your engine should also be simple enough that the user will barely notice that the sprites will get batched before rendering.

By static, do you mean moving/non-moving? My sprite batch just contains a single STL vector of vertices for all sprites attached to it. Having only a single sprite in a sprite batch wouldn't be slower than drawing sprites individually, but it's more setup code as you'd have to load the texture, allocate the sprite batch, link the texture to the sprite batch, allocate a sprite, link it to the sprite batch, setup sprite blitting params, then transform/update animations as necessary.

Then again, my SpriteBatch class does contain a list of metadata for sprite sheet animations too. You'd define the dimensions of the animation frame, how many frames, etc in a SpriteAnimation object, then link that to your SpriteBatch instance that's keeping reference to the sprite sheet texture. Then, the sprite instance just needs to use SetAnimation(), frame trimming, playback state, playback framerate (all optional), etc.

@Matias: I'm trying to add onto my reply above since your post came in after my reply, but it doesn't seem to let me, so I apologize in advance for the double post. Here's are my replies to your points:

-I've seen some good things in OpenGL 3.3, but my main target is mobile (yep, I'm on the mobile bandwagon...). I'm excited for OpenGL ES 3.0 since the specification does appear to support batching and MRTs for deferred rendering, but it looks like it could be a while before we start seeing devices that completely support the spec.

-Software rendering is probably a benefit for static sprites as you just transform them once, and update them whenever something changes in the future instead of every frame in a vertex shader. However, software transformation is kind of intense when something actually does change because there's 3, possibly 4 matrix4x4 multiplications and 4 matrix4x4 * vector3 operations. That's 48 to 64 dot products for the matrix multiplication and 12 more dot products to transform the vertices.

-As far as frustum culling is concerned, I may stop using my current Sprite class for tile maps and create a simpler sprite class that treats the sprites as static and generates sprite batches based on a quadtree. If I want anything dynamic, I'll treat them as actual sprites with full functionality that character sprites would have.

-So, if I move the vertices out of the viewport, the GPU won't attempt to draw them? I'm still sending the data to the GPU, but if it's outside the viewport upon final transformation in the vertex shader, the pixel shader won't be processed? If I understood that correctly, then I feel better about my fill rate concerns. I'm still sending unnecessary data to the GPU, but then again 3D games are sending way more vertex data to GPU with frustum culling than you'd encounter in an entire 2D scene, generally. I would think anyway.

-I've thought about writing an atlas generator, but it was meant to be a tool to generate them offline and produce meta data to tell my SpriteBatch class the coordinates of the sprites and animation frames held in the atlas. On-the-fly generation seems as if it could produce long load times as you'd be loading many separate image files at once instead of one larger one, and possibly generating sprite definition data on-the-spot.

Most of thee worries involve negligible amounts of data and processing: for example, adding a colour to every one of hundreds and hundreds of sprites amounts to 1 (index) to 8 (16-bit RGBA) hundreds and hundreds of bytes per frame. Are there technical reasons to use many batches instead of putting all compatible sprites in the same batch? Singleton or almost singleton entities like the "4 main characters" can have reserved places in a shared vertex buffer rather than their separate batches.

Omae Wa Mou Shindeiru

For batching 3d meshes I have a system which allows me to:

- batch "batches" (store only the list pointer)

- batch single meshes (store the data in a local list)

I use the mesh batches for blocks which don't change so often. So in this case efficient culling comes at price of drawing out-of-frustum objects.

Instead of copying data (except for the single mesh case) I only add a batch list to a list of batches. Before drawing I upload the data to a buffer object (could be constant buffer or vertex buffer). So this way I can have one draw call per mesh type regardless how the meshes are batched. This can be implemented with sprites too. Of course it involves copying data around which may be less efficient with mobile devices.

Cheers!

Most of thee worries involve negligible amounts of data and processing: for example, adding a colour to every one of hundreds and hundreds of sprites amounts to 1 (index) to 8 (16-bit RGBA) hundreds and hundreds of bytes per frame. Are there technical reasons to use many batches instead of putting all compatible sprites in the same batch? Singleton or almost singleton entities like the "4 main characters" can have reserved places in a shared vertex buffer rather than their separate batches.

The way I process my sprites currently are by texture: if I have a bunch of sprites using a texture, or group of similar textures for animated sprite sheets, etc, then all my sprites are listed in that sprite batch. Now, since the OpenGL specs I'm confined to at this time do not allow for the manipulation of my batches on a per object basis, I have to simulate that by copying that data into my vertices.

Right now, the idea is that I have a texture atlas with a bunch of images within it that my sprites can use. I link my sprite to the appropriate batch, and it usually never changes since it makes sense to stick with that batch. Now, the only way a sprite can be rendered is if it's attached to a sprite batch because it is what has access to the textures, not the sprite. I could very well have multiple batches referencing the same texture, which would simulate a quad tree as far as scene rendering goes...

possibly 4 matrix4x4 multiplications and 4 matrix4x4 * vector3 operations. That's 48 to 64 dot products for the matrix multiplication and 12 more dot products to transform the vertices.

If your sprites don't rotate (or you classify between rotating & not rotating), you can get away with no matrix multiplication at all.
After all the transformation is just:
outPos.xy = inVertex.xy * scale.xy + position.xy;
outPos.zw = float2( -1, 1 );
That's just 1 dot product per vertex. Just make sure outPos.xy is in the [-1; 1] range.

-So, if I move the vertices out of the viewport, the GPU won't attempt to draw them? I'm still sending the data to the GPU, but if it's outside the viewport upon final transformation in the vertex shader, the pixel shader won't be processed? If I understood that correctly, then I feel better about my fill rate concerns. I'm still sending unnecessary data to the GPU, but then again 3D games are sending way more vertex data to GPU with frustum culling than you'd encounter in an entire 2D scene, generally. I would think anyway.

Yes, that's exactly what happens. Make sure all 3 vertices from the triangle lay outside the viewport. But don't make the number too big (i.e. don't place it at 8192.0 in screen coordinates; just bigger than 1.0 or less than -1.0 but not too far away; and W must not be 0)

I like your thinking, and I do have a combiner method in my Matrix class that'll put a position and scale vector into a matrix as their elements are mutually exclusive. Rotation isn't all that common for many things, so that could be a nice workaround. The only problem is that many different types of objects 'can' rotate. What I could do is check to see if the rotation angle has changed between frames. If not, but position or scale have, then I can just use that combiner method in the class itself, possibly.

The code posted above does appear to be lightning fast. I'll keep that in mind for moving my triangles out of the way. When you say bigger than 1.0, is that how many pixels you suggest moving them out? My ortho matrix is based off of pixels.

This topic is closed to new replies.

Advertisement