Efficient 2D sprite designs

Started by
4 comments, last by Michael Anthony Wion 12 years ago
Okay, I know this has been rehashed several times in the past (with unknown accuracy), but I can't help but ask for a more updated answer...

Options:
1.) ID3DXSprites. Supposedly fast, and definitely easy to use (but scary since I have no idea how exactly it handles everything behind the scenes, and holds a few limitations).
2.) Textured quads using separate triangle strip vertex buffers (this is what I'm currently using, and works well, at least on small tests).
3.) Textured quads using a singular triangle list vertex buffer (uses more vertices per quad, but is able to manage several separate quads without "connecting" them).
4.) ???

... Which is the most optimal in terms of performance?
My goal is to keep the frame rate => 60 FPS whilst rendering up to 1,000 (possibly more) sprites per frame, so I can't settle for any lesser option than whatever would be the most optimal. Also, if you have any benchmark results for each of these designs, I'd very much like to know them.
Advertisement
If the hardware supports it, instancing might be better, but I doubt it matters for such low geometry counts. ID3DXSprites is already made for you, but I've heard its limited and sub-optimal.

I don't think there's any big benefit to using triangle strips over triangle lists, given the likely use cases here. I'd imagine that any shared vertices will be sitting in the cache regardless of tri-strip or tri-list, and since lists are more general they should be less fuss to implement and use.


Do what's simple and effective now, and re-factor if and when it becomes a real (i.e. measurable) problem.

throw table_exception("(? ???)? ? ???");

I haven't tampered with instancing yet, but another member once suggested the same.
I understand that it allows for a small amount of changes to each part of the original?
If that's the case, does it allow for a change in position, as well as texture coordinates?
I ask this because each each sprite needs to have it's individual location and animation frame within a given texture.
Okay so I've figured out the concepts of instancing and have it set up nicely to allow for changes to position and texture coordinates per instance.
This is nice because the geometry of a quad never changes, but the location of the quad and the animation frame of the texture does.
After a quick stress test, this design seems to beat the rest hands down (I can render over 10,000 sprites at 100-300 fps, windowed!!)
Now, can anybody tell me what the best way to manage the instance buffer would be, in the event of adding/removing elements at runtime?

Example: Sprite #3 (out of 6 total) dies, and no longer needs rendering. So should we:
a.) Move elements #4, #5 and #6 to become #3, #4 and #5, and reallocate the buffer size to 5? (Sounds expensive if the buffer size is huge)
b.) Create a flag, telling the shader whether or not to render the specified instance. Newly created sprites will overwrite any slot with this flag set.
c.) ???

I'd really appreciate people's thoughts on this!
I'm not 100% aware of the details, but I assume that there's a way to say how many instances are to be drawn during each frame -- in that case, you could just copy element #6 into the space formerly occupied by element #3, and then only render the first 5 elements. Basically, just front-load the buffer with the "active" elements, and there may be un-used elements at the end. This is the same as what std::remove_if does -- it just partitions the map, it doesn't destroy the elements or resize the underlying storage.

If you were relying on the order of elements for draw ordering (not even sure the API makes any guarantees here) you might have to add some additional instance values/shader logic (to handle a depth value), but you might well have that already.

Then again, it may not even be that big of a deal to create this buffer anew once per frame -- or, maybe you allocate it once on the GPU and re-fill it every frame from system memory.

throw table_exception("(? ???)? ? ???");

I just keep a private variable in the class to store the number of instances.
What about a slight combination of option A and B, where each instance holds a "state" variable indicating it's current status.
Then during each update call, I would iterate through the buffer searching for dead elements, replacing them with topmost elements?
As for the draw ordering, I think I could just set each Z value to be slotNumber / numberOfSlots and call it a day.
Could this end up being overkill for large buffers?

BTW, I'm not creating the buffer every frame. I create it once during the initialization phase, update it only when necessary, and memcpy it every frame.

This topic is closed to new replies.

Advertisement