Jump to content

  • Log In with Google      Sign In   
  • Create Account

Which situation is more optimal for sprite batching?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
4 replies to this topic

#1 gemini_   Members   -  Reputation: 237

Like
0Likes
Like

Posted 27 December 2012 - 12:08 PM

So two scenarios:

 

1. Store attributes of each sprite (around 12 to 16 for each sprite) in a texture buffer and compute the transform matrices in the shader.

 

2. Compute all transform matrices for each sprite on the cpu and store them in a uniform buffer.

 

Both would involve instancing. Which one would give the best performance?



Sponsor:

#2 Maus   Members   -  Reputation: 700

Like
1Likes
Like

Posted 27 December 2012 - 05:17 PM

Usually the transformation is calculated outside of the shader, because the shader has to do it repeatedly for each vertex and you have a much better control of the calculations this way. You can also optimize it (e.g. with SSE) for max performance and so the CPU and GPU work perfectly side by side.

There is a nice article from Jari Komppa about this topic: http://sol.gfxile.net/instancing.html

Best regards
- Martin

Edited by MausGames, 27 December 2012 - 05:23 PM.


#3 mhagain   Crossbones+   -  Reputation: 8278

Like
1Likes
Like

Posted 27 December 2012 - 05:51 PM

For camera-facing sprites you don't even need a matrix per-sprite; you just need your global MVP (once-only no matter how many sprites you have) then each sprite gets an up and right (or left, depending on how your coordinate system is set up) vector that's used to do the billboarding.  That reduces your per-instance data to 6 floats per-sprite and the transform is just two extra MAD operations.


It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#4 mattdesl   Members   -  Reputation: 176

Like
0Likes
Like

Posted 29 December 2012 - 01:08 PM

If you were using GL 3+, you can utilize geometry shaders and instancing. 

 

But really there is no need to restrict your audience to GL 3+ (it's not yet very widely supported, especially not in the casual market). GL 2+ functionality will be more than enough for a 2D sprite batcher.

 

The simplest way to optimize your sprite batcher is to send as little as possible to GL. Update your uniforms (e.g. ortho 2D projection matrix) only as necessary, and only pass the attributes you really need. If you know some of your data is static, you can use multiple VBOs, otherwise you should just stick to interleaved data.

 

My sprite batcher uses the following attributes per vertex:

 

{ x, y, color, u, v }

 

 

The color is a packed float. That means 5 attributes per vertex, and 4 vertices per sprite using element indices. One-off transformations (like sprite rotation) are done in CPU before passing vetex position.

 

Most of the time, optimizing in 2D is more about using texture atlases, improving batch rendering, and minimizing overdraw rather than worrying about small things like whether or not to interleave your VBO data. Besides, 99% of the time a 2D game will become fill-rate bound before it becomes vertex bound.

 

On the subject of fill rate, you can use hexagons or some other non-rectangular primitive as an optimization (although it comes at the cost of more vertices and may be undesirable depending on you sprites). 

 

For more reading on my sprite batcher:

https://github.com/mattdesl/lwjgl-basics/wiki/Sprite-Batching

https://github.com/mattdesl/lwjgl-basics/blob/master/src/mdesl/graphics/SpriteBatch.java

 

Using plain old VBOs or even vertex arrays will be plenty fast, but if you want to squeeze a bit more performance out of GL 2.0 you should look into mapped VBOs:

http://www.java-gaming.org/topics/opengl-lightning-fast-managed-vbo-mapping/28209/view.html


Edited by mattdesl, 29 December 2012 - 01:13 PM.


#5 mhagain   Crossbones+   -  Reputation: 8278

Like
0Likes
Like

Posted 02 January 2013 - 06:15 AM

Using plain old VBOs or even vertex arrays will be plenty fast, but if you want to squeeze a bit more performance out of GL 2.0 you should look into mapped VBOs:

http://www.java-gaming.org/topics/opengl-lightning-fast-managed-vbo-mapping/28209/view.html

 

Worth noting that the fast path given in that post relies on GL_ARB_map_buffer_range, which is not always going to be available with GL2.0 (in practice the hardware will always be capable of this path - D3D has been doing it since at least version 7 - so it's down to whether or not the vendor exposes it in their driver).


It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS