DirectXTK SpriteBatch depth testing and instancing?

Started by
9 comments, last by trojanfoe 6 years, 1 month ago

Hi there, this is my first post in what looks to be a very interesting forum.

I am using DirectXTK to put together my 2D game engine but would like to use the GPU depth buffer in order to avoid sorting back-to-front on the CPU and I think I also want to use GPU instancing, so can I do that with SpriteBatch or am I looking at implementing my own sprite rendering?

Thanks in advance!

Indie Game Dev

Advertisement
1 hour ago, trojanfoe said:

GPU depth buffer in order to avoid sorting back-to-front on the CPU

In a very basic setting, your sprites will contain opaque and transparent fragments (e.g. text). These transparent fragments need to be blended correctly with the fragments behind them to "leak" the correct background. This can be achieved with the depth buffer in at least two separate passes for one layer of transparency (you can use more passes for multiple layers you want to support on top of each other as well). Alternatively, you can sort the sprites on the CPU, while only requiring a single pass when rendering the sprites in order (based on the sorting).

For transparent 3D objects, you can use the depth buffer as well, but CPU sorting is not guaranteed to be possible in all cases. You can have interlocked transparent triangles, for example, which cannot be sorted once for the whole image, but should rather be sorted per pixel. Sprites, on the other hand, are just stacked on top of each other. So you can always sort once for the whole image, instead of per pixel, allowing CPU sorting in all cases.

So given the above, I would say to carefully profile the GPU depth buffer approach for sprites, because I expect the CPU sorting to be faster for a common load of sprites. Even if you have an extreme number of sprites, you can always rely on insertion sort based sorting algorithms while exploiting coherency between frames.

🧙

2 hours ago, trojanfoe said:

GPU instancing

This can be really advantageous given that you have quite some data stored at the vertices. Currently, DirectXTK only uses a position, color and pair of texture coordinates per vertex for sprites, which is pretty match the equivalent of a single transformation matrix.

🧙

Hey thanks for the replies. I think I was chasing down performance issues I don't actually have.  Premature optimisation :)

I already have a solution for depth sorting in the CPU by only allowing a small finite number of layers, and I will ignore the GPU instancing until I have more experience with Direct X and stick with SpriteBatch.

 

Indie Game Dev

12 minutes ago, trojanfoe said:

I think I was chasing down performance issues I don't actually have.  Premature optimisation

Nah, I wouldn't call it premature optimization, since the actual decision in both cases can have a large impact on the code base. I mean, one should think things through at a certain level before coding instead of rushing off and pushing your feet at every stone on the way. Of course, if you want to foresee everything in advance, you won't have written down a single line of code by the end of the day. On the other hand, continuous refactoring when the problems start to appear is wasteful as well, since refactoring by itself does not result in added value. So just find a balance between designing (standing still) and coding (moving forward) :)

🧙

But there is also the need to actually write a replacement SpriteBatch in order to support instancing...  I'll go head-in-the-sand for the time being :) 

 

Indie Game Dev

You might also want to look into sprite atlases for improving performance CPU wise, although it's in a different area of the CPU overhead (sprite atlases allows the SpriteBatch to stuff more sprites into a single drawcall) and that would likely break transparency. But as in any case, if you want to improve performance, don't forget to profile first :) 

Hey thanks - already on that one.

The major issue I currently have with SpriteBatch is that I want to pass it a world transform rather than position, origin, scale and rotation, as the transform is stored in my scene graph entities and is accumulated as the scene graph is traversed (i.e. multiplied with parent transform).  At the moment I am having to decompose the transform back to position, origin, scale and rotation, however I cannot get the maths right for it to work.

Passing transforms to the vertex shader on a per-object basis looks tricky anyway, unless you want to use structured buffers (SM5+ ?), so I think I am looking at a somewhat high-end sprite renderer.  Oh well, it's fun learning as I go.

Indie Game Dev

5 minutes ago, trojanfoe said:

At the moment I am having to decompose the transform back to position, origin, scale and rotation, however I cannot get the maths right for it to work.

But why do you not work with the components itself in your transform? If the matrix calculation takes too long, just cache the matrix with a dirty flag.

🧙

Yeah I already do that - I keep position, origin, scale and rotation and a transform and inverse transform, both the latter with dirty flags.  The matrices are calculated in their "get" methods if their dirty flags are set (I think we are on the same page).

At the moment I can use SpriteBatch in a "flat" scene graph by simply providing the position, origin, etc to it's Draw() method, but in order to get a hierarchical scene graph working I would need to decompose the world transform back to position, origin, etc and that's where my maths breaks down.  However this seems like a waste to decompose, given the vertex shader loves matrices so much anyway, so I am thinking:

1) For SM5+ hardware, I will pass the transform and other stuff in a structured buffer.

2) For < SM5 I think I am looking at passing the transform in a per-object constant buffer.

I am currently trying to learn how to do that :)

Indie Game Dev

This topic is closed to new replies.

Advertisement