Ways to render a massive amount of sprites.

Started by
11 comments, last by Icebone1000 9 years, 2 months ago

Greetings,

I'm thinking about rendering 2d quads. I'll try to write the ways I know with pros and cons. But I hope that more experienced guys will fix me if I wrong and will add another methods. Important note - I need to calculate data on cpu, so I think gpgpu will not help me.

1. Every quad is a separate object with it's own vertex buffer. For rendering will be used DrawIndexed(6, 0, 0). For position/size/rotation change we need to recreate buffer from scratch. Worst method in my opinion.

2. Every quad is a separate object with it's own dynamic vertex buffer. For rendering also will be used DrawIndexed(6, 0, 0). For position/size/rotation change we need to update buffer from. If there's a lot of sprites updating a lot of dynamic buffers will kill performance (will it? never tried).

3. Use one big vertex buffer for n quads. Since recreating such a big buffer from scratch every frame is not a good idea, let's use dynamic vertex buffer. The huge win is that we're calling DrawIndexed() only once. But again, updating such a big buffer will be slow, right?

4. Use one big vertex buffer for n quads. Render all quads in one draw call. For position/size/rotation we need to provide n matrices with constants buffers. The number of quads in one draw call is limited with number of matrices we can pass. I don't know how efficient is this. As far as I know passing constant buffers is not efficient. We can pass position point and expand to quad in geometry shader.

5. Use instancing. Have no idea about this, never tried.

Can you correct me or add something different?

Advertisement

Ok, use one or a handful of buffers (depends on how dynamic the number of sprites is), then just fill the buffer each frame and use one or a handful of drawcalls to render them. When filling, you can:

1. Recalcuate the sprite positions on the CPU each frame and save only the final positions in the buffer (simple and quite fast).

2. Save the sprite position as vertex attributes (use vec4 for position and rotation, use vec4 quaternion for rotation if you need higher degree of freedom)

2a. you need to clone the position/rotation on each vertex

2b. you can utilize a geometry shader if you need to save space/bandwidth

If you use a double/tripple buffering approach, then you can even fill up the buffer concurrently.

#1/2 are terrible -- draw calls have a high CPU cost, so you want to draw thousands of particles with each draw call. You'd probably only be able to manage a few thousand particles with this method before you become completely CPU bottlenecked (with the GPU sitting around mostly idle).

But again, updating such a big buffer will be slow, right?

Updating 1 buffer of size 1000 will be a lot faster than updating 1000 buffers of size 1!

Modern PC's have a memory bandwidth of somewhere in the range of ~20GB/s, or 341MB/frame at 60fps.

100k vertex positions is ~1MB -- or 0.3% of your memory bandwidth budget, so this should not be a problem.

For position/size/rotation we need to provide n matrices with constants buffers.

Constant buffers are optimal for situations where every vertex/pixel needs to use every bit of information in the buffer. In your case, each vertex only needs one of the constants (one position), so this is non-ideal. Just use a regular/vertex buffer instead.

#5 - instancing lets you specify each position only once, instead of 4 times per quad, and then re-use the same 4 corner-offsets for each quad in order to generate the correct positions.

I think you are over-thinking it!

What exactly are you trying to do? It is a 2D sprite based game? There is absolutely no way you will have performance problems. The more old school it is, the less you will have. Just don't render the whole level, do some frustum culling. Have a good map/level representation, sit down and code and you should see more than 200 FPS and dynamic 2D lighting.

But for a more scaleable solution, you should divide the level into chunks, like a 32x32 chunk of sprites. When you scroll create new chunks as needed and just cull on a chunk basis.

On the other hand, if you are incorporating sprites in a full 3D game, like for particles, the situation changes.

Thank you guys.

I'm working on a GUI. And I want it to consume as less as possible.

About constants buffer - I thought to store width and height of the quad (maybe uv also) and 3x3 matrix. There will be a lot of constants buffers, or one with huge array (don't know is that possible). Vertex buffer will have only index of buffer to use. So I just create this quad in geometry shader and apply corresponding matrix. That way I don't need to update vertex buffer at all. But I can't find information about cost of constant buffer update.

Btw,

Modern PC's have a memory bandwidth of somewhere in the range of ~20GB/s, or 341MB/frame at 60fps.

100k vertex positions is ~1MB -- or 0.3% of your memory bandwidth budget, so this should not be a problem.

I more than satisfied with this. Thanks a lot!

Its simple, try and minimize Draw calls, constant buffer updates and vertex/index buffer submissions. I feel like most 2D games are reliant on CPU's these days however since almost every frame you'll need to wait for the CPU to order some sprites for the GPU to draw.

I also feel like for a basic 2D game there only needs to be one constant buffer with one item inside it - which is the Matrix transformation used to render the quad (this would be inside the vertex buffer). The pixel shader needs the texture2d shader resource used for texturing the quads.

Edit: Also try to minimize how often you change the pixel shader's texture2d shader resource - i.e. create a texture alas and/or try to draw quads that use the same texture together.


#5 - instancing lets you specify each position only once, instead of 4 times per quad, and then re-use the same 4 corner-offsets for each quad in order to generate the correct positions.

I'm curious, what's the difference between using instancing like this vs using a vertex buffer with one point per quad and expanding that in a geometry shader? Is there a benefit to one approach over the other?

Eric Richards

SlimDX tutorials - http://www.richardssoftware.net/

Twitter - @EricRichards22

#5 - instancing lets you specify each position only once, instead of 4 times per quad, and then re-use the same 4 corner-offsets for each quad in order to generate the correct positions.


I'm curious, what's the difference between using instancing like this vs using a vertex buffer with one point per quad and expanding that in a geometry shader? Is there a benefit to one approach over the other?
I honestly don't know, and would love to see some benchmarks ;)

The pros/cons are minor -
Instancing let's you avoid writing another shader kernel, and GS lets you use a regular VS designed for non-instanced rendering.
Instancing works on SM3 hardware, while GS requires SM4.
In my game I have to draw many quads at the same time in different positions (these are particles which can collide with environmental objects) I use instancing to send the points to the gpu then a geometry shader to translate points to textured quads.

This works of however I have been told there are more efficient ways to do it... YMMV.

4096 sprites rendered using instancing. I found out that H.264 really hates this as it seems to be about as compressible as white noise. tongue.png

This one came out better.

I was using instancing with programmable vertex pulling. The limiting factor is definitely fill rate.

This topic is closed to new replies.

Advertisement