Which situation is more optimal for sprite batching?

Started by
3 comments, last by 21st Century Moose 11 years, 3 months ago

So two scenarios:

1. Store attributes of each sprite (around 12 to 16 for each sprite) in a texture buffer and compute the transform matrices in the shader.

2. Compute all transform matrices for each sprite on the cpu and store them in a uniform buffer.

Both would involve instancing. Which one would give the best performance?

Advertisement
Usually the transformation is calculated outside of the shader, because the shader has to do it repeatedly for each vertex and you have a much better control of the calculations this way. You can also optimize it (e.g. with SSE) for max performance and so the CPU and GPU work perfectly side by side.

There is a nice article from Jari Komppa about this topic: http://sol.gfxile.net/instancing.html

Best regards
- Martin

For camera-facing sprites you don't even need a matrix per-sprite; you just need your global MVP (once-only no matter how many sprites you have) then each sprite gets an up and right (or left, depending on how your coordinate system is set up) vector that's used to do the billboarding. That reduces your per-instance data to 6 floats per-sprite and the transform is just two extra MAD operations.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

If you were using GL 3+, you can utilize geometry shaders and instancing.

But really there is no need to restrict your audience to GL 3+ (it's not yet very widely supported, especially not in the casual market). GL 2+ functionality will be more than enough for a 2D sprite batcher.

The simplest way to optimize your sprite batcher is to send as little as possible to GL. Update your uniforms (e.g. ortho 2D projection matrix) only as necessary, and only pass the attributes you really need. If you know some of your data is static, you can use multiple VBOs, otherwise you should just stick to interleaved data.

My sprite batcher uses the following attributes per vertex:


{ x, y, color, u, v }

The color is a packed float. That means 5 attributes per vertex, and 4 vertices per sprite using element indices. One-off transformations (like sprite rotation) are done in CPU before passing vetex position.

Most of the time, optimizing in 2D is more about using texture atlases, improving batch rendering, and minimizing overdraw rather than worrying about small things like whether or not to interleave your VBO data. Besides, 99% of the time a 2D game will become fill-rate bound before it becomes vertex bound.

On the subject of fill rate, you can use hexagons or some other non-rectangular primitive as an optimization (although it comes at the cost of more vertices and may be undesirable depending on you sprites).

For more reading on my sprite batcher:

https://github.com/mattdesl/lwjgl-basics/wiki/Sprite-Batching

https://github.com/mattdesl/lwjgl-basics/blob/master/src/mdesl/graphics/SpriteBatch.java

Using plain old VBOs or even vertex arrays will be plenty fast, but if you want to squeeze a bit more performance out of GL 2.0 you should look into mapped VBOs:

http://www.java-gaming.org/topics/opengl-lightning-fast-managed-vbo-mapping/28209/view.html

Using plain old VBOs or even vertex arrays will be plenty fast, but if you want to squeeze a bit more performance out of GL 2.0 you should look into mapped VBOs:

http://www.java-gaming.org/topics/opengl-lightning-fast-managed-vbo-mapping/28209/view.html

Worth noting that the fast path given in that post relies on GL_ARB_map_buffer_range, which is not always going to be available with GL2.0 (in practice the hardware will always be capable of this path - D3D has been doing it since at least version 7 - so it's down to whether or not the vendor exposes it in their driver).

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

This topic is closed to new replies.

Advertisement