Sign in to follow this  
Volgogradetzzz

Ways to render a massive amount of sprites.

Recommended Posts

Greetings,

 

I'm thinking about rendering 2d quads. I'll try to write the ways I know with pros and cons. But I hope that more experienced guys will fix me if I wrong and will add another methods. Important note - I need to calculate data on cpu, so I think gpgpu will not help me.

 

1. Every quad is a separate object with it's own vertex buffer. For rendering will be used DrawIndexed(6, 0, 0). For position/size/rotation change we need to recreate buffer from scratch. Worst method in my opinion.

 

2. Every quad is a separate object with it's own dynamic vertex buffer. For rendering also will be used DrawIndexed(6, 0, 0). For position/size/rotation change we need to update buffer from. If there's a lot of sprites updating a lot of dynamic buffers will kill performance (will it? never tried).

 

3. Use one big vertex buffer for n quads. Since recreating such a big buffer from scratch every frame is not a good idea, let's use dynamic vertex buffer. The huge win is that we're calling DrawIndexed() only once. But again, updating such a big buffer will be slow, right?

 

4. Use one big vertex buffer for n quads. Render all quads in one draw call. For position/size/rotation we need to provide n matrices with constants buffers. The number of quads in one draw call is limited with number of matrices we can pass. I don't know how efficient is this. As far as I know passing constant buffers is not efficient. We can pass position point and expand to quad in geometry shader.

 

5. Use instancing. Have no idea about this, never tried.

 

Can you correct me or add something different?

Share this post


Link to post
Share on other sites

Ok, use one or a handful of buffers (depends on how dynamic the number of sprites is), then just fill the buffer each frame and use one or a handful of drawcalls to render them. When filling, you can:

1. Recalcuate the sprite positions on the CPU each frame and save only the final positions in the buffer (simple and quite fast).

2. Save the sprite position as vertex attributes (use vec4 for position and rotation, use vec4 quaternion for rotation if you need higher degree of freedom)

2a. you need to clone the position/rotation on each vertex

2b. you can utilize a geometry shader if you need to save space/bandwidth

 

If you use a double/tripple buffering approach, then you can even fill up the buffer concurrently.

Edited by Ashaman73

Share this post


Link to post
Share on other sites

I think you are over-thinking it!

 

What exactly are you trying to do? It is a 2D sprite based game? There is absolutely no way you will have performance problems. The more old school it is, the less you will have. Just don't render the whole level, do some frustum culling. Have a good map/level representation, sit down and code and you should see more than 200 FPS and dynamic 2D lighting.

 

But for a more scaleable solution, you should divide the level into chunks, like a 32x32 chunk of sprites. When you scroll create new chunks as needed and just cull on a chunk basis.

 

On the other hand, if you are incorporating sprites in a full 3D game, like for particles, the situation changes.

Share this post


Link to post
Share on other sites

Thank you guys.

 

I'm working on a GUI. And I want it to consume as less as possible.

 

About constants buffer - I thought to store width and height of the quad (maybe uv also) and 3x3 matrix. There will be a lot of constants buffers, or one with huge array (don't know is that possible). Vertex buffer will have only index of buffer to use. So I just create this quad in geometry shader and apply corresponding matrix. That way I don't need to update vertex buffer at all. But I can't find information about cost of constant buffer update.

 

Btw,

Modern PC's have a memory bandwidth of somewhere in the range of ~20GB/s, or 341MB/frame at 60fps.

100k vertex positions is ~1MB -- or 0.3% of your memory bandwidth budget, so this should not be a problem.

 

 

I more than satisfied with this. Thanks a lot!

Edited by nikitablack

Share this post


Link to post
Share on other sites

Its simple, try and minimize Draw calls, constant buffer updates and vertex/index buffer submissions. I feel like most 2D games are reliant on CPU's these days however since almost every frame you'll need to wait for the CPU to order some sprites for the GPU to draw.

 

I also feel like for a basic 2D game there only needs to be one constant buffer with one item inside it - which is the Matrix transformation used to render the quad (this would be inside the vertex buffer). The pixel shader needs the texture2d shader resource used for texturing the quads.

 

Edit: Also try to minimize how often you change the pixel shader's texture2d shader resource - i.e. create a texture alas and/or try to draw quads that use the same texture together.

Edited by Xanather

Share this post


Link to post
Share on other sites


#5 - instancing lets you specify each position only once, instead of 4 times per quad, and then re-use the same 4 corner-offsets for each quad in order to generate the correct positions.

 

I'm curious, what's the difference between using instancing like this vs using a vertex buffer with one point per quad and expanding that in a geometry shader? Is there a benefit to one approach over the other?

Share this post


Link to post
Share on other sites

#5 - instancing lets you specify each position only once, instead of 4 times per quad, and then re-use the same 4 corner-offsets for each quad in order to generate the correct positions.

 
I'm curious, what's the difference between using instancing like this vs using a vertex buffer with one point per quad and expanding that in a geometry shader? Is there a benefit to one approach over the other?
I honestly don't know, and would love to see some benchmarks ;)

The pros/cons are minor -
Instancing let's you avoid writing another shader kernel, and GS lets you use a regular VS designed for non-instanced rendering.
Instancing works on SM3 hardware, while GS requires SM4.

Share this post


Link to post
Share on other sites
In my game I have to draw many quads at the same time in different positions (these are particles which can collide with environmental objects) I use instancing to send the points to the gpu then a geometry shader to translate points to textured quads.

This works of however I have been told there are more efficient ways to do it... YMMV.

Share this post


Link to post
Share on other sites

4096 sprites rendered using instancing. I found out that H.264 really hates this as it seems to be about as compressible as white noise. tongue.png

 

https://www.youtube.com/watch?v=2L5EDYT5jo4

 

This one came out better.

 

https://www.youtube.com/watch?v=V3EGwsafxrk

 

I was using instancing with programmable vertex pulling. The limiting factor is definitely fill rate.

Share this post


Link to post
Share on other sites

Another potential method is to store the sprite info in a buffer resources, and then use an SRV in the vertex shader to grab the data out of it.  The interesting thing that you can do, is to create the vertices completely in the vertex shader, with no vertex buffers required at all!  The general idea is to use your draw call to specify how many vertices are generated, and then use the SV_VertexID semantic value to grab the appropriate data from the SRV.  You would use 4 vertex shader invocations for each of the quads, and the method is relatively efficient.  If you check out the particle storm demo from Hieroglyph 3, there is an example of creating vertices without vertex buffers (although I expand the vertices in the GS as mentioned above).

 

Regarding the use of instancing, you are probably not going to see an improvement for the use of 4 vertices as your instanced geometry.  In general, unless you are using geometry with 100 or more vertices you won't see much improvement (at least that is general advice, but your particular situation may or may not reflect the general rule of thumb...).

Share this post


Link to post
Share on other sites

3. Use one big vertex buffer for n quads. Since recreating such a big buffer from scratch every frame is not a good idea, let's use dynamic vertex buffer. The huge win is that we're calling DrawIndexed() only once. But again, updating such a big buffer will be slow, right?

 

 

I'm working on a GUI. And I want it to consume as less as possible.

 

I have used #3 for a GUI and did not find it slow.

 

Since you are working on a GUI you might want to consider what you specifically need for that use case.  For example, specifying transforms for each quad sounds a bit like overkill to me.  You can certainly do it, but do you need to for a GUI?

 

I think in many cases the quads of a GUI are relatively static.  You might want some animation for hovering the cursor or sliding elements into view or whatever, but most things are just going to sit where you stick them.  You could transform the verts of the quad on the CPU and then keep them around until there is a change.  

 

Consider something like a text prompt/label, you are probably going to plop it on the screen somewhere, compute the quads for the letters, and from that point on it probably isn't going to move or rotate or scale.  If you were to create the array of verts and hold onto it then all you have to do is copy the verts into the dynamic buffer every frame.

 

Assuming your GUI doesn't have an enormous amount of constantly changing elements, computing the new quads when the data changes is probably not going to be a big deal -- FPS counter, score, stats...  Even then, those things are probably not changing values every frame.

 

You might want to take a look at the DirectX Took Kit which has a sprite batcher similar to what I'm describing.

Share this post


Link to post
Share on other sites

So..Im the only one who have a single vertex buffer, normalized size (-0.5 to 0.5) that never leaves the slot?

I just scale it on the vertex shader, same thing with UVs.

Since size and uvs are commonly dynamic anyway, why bother with vertices?

 

Is that bad? Works fine to me. It makes more sense to use constant buffers for this kind of stuff to me. Codewise is also more clear.

//xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
//Sprite HLSL V9
//xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

//================================================================
//Shader Constant Buffers
//================================================================

cbuffer drawable : register(b0){
	float2 res	: packoffset(c0.x);
	float2 padding	: packoffset(c0.z);
	float4 uvRect	: packoffset(c1);
	matrix mWorld	: packoffset(c2);
	float4 color	: packoffset(c6);
}
cbuffer camera : register(b1){
	matrix mViewProjection;
}
//================================================================


//=================================================================
//Shader Types
//=================================================================
struct vs_out{
	float4 pos	: SV_POSITION;
	float4 color	: COLOR;
	float2 uv	: TEXTCOORD;
};
//=================================================================


//==================================================================
//VS
//==================================================================

vs_out vs_Sprite( float3 pos_p : POSITION, float2 uv_p : TEXCOORD ){

	vs_out output = (vs_out)0;

	//vertex scaling
	pos_p.xy *= res;
	pos_p.xy += padding.xy; // offset

	output.pos = mul( float4( pos_p, 1.0f ), mWorld );

	//camera(view) trafo:
	output.pos = mul( output.pos, mViewProjection ); 

	//uv offset:
	output.uv = uvRect.xy + (uv_p * uvRect.zw); //uv_p + uvOffset;

	// just pass color on:
	output.color = color;

	return output;	
}
//===================================================================

//=====================================================================
//Shader Resources
//=====================================================================
Texture2D tex2D : register(t0);
SamplerState samplerMode : register(s0);
//=====================================================================

//===================================================================
//PS
//===================================================================

float4 ps_Sprite( vs_out input_p ) : SV_TARGET{

	return tex2D.Sample( samplerMode, input_p.uv ) * input_p.color;
}
//==================================================================


And for Instancing:

...
//=================================================================
//Shader Types
//=================================================================
struct vs_out{
	...
};
struct vsInstance_In{
	float2 res	: INST_RES;
	float2 padding	: INST_PADD;
	float4 uvRect	: INST_TEXTCOORD;
	matrix mWorld	: INST_WORLD;
	float4 color	: INST_COLOR;
};
//=================================================================

//==================================================================
//VS
//==================================================================

vs_out vs_instancedSprite( float3 pos_p : POSITION, float2 uv_p : TEXCOORD, vsInstance_In instance_p ){

	vs_out output = (vs_out)0;
	
	//vertex scaling
	pos_p.xy *= instance_p.res;
	pos_p.xy += instance_p.padding.xy;
	
	//instance trafo
	output.pos = mul( float4( pos_p, 1.0f ), instance_p.mWorld );
	
	//camera(view) trafo:
	output.pos = mul( output.pos, mViewProjection );
	
	//uv offset:
	output.uv = instance_p.uvRect.xy + (uv_p * instance_p.uvRect.zw);
	
	// just pass stuff on:
	output.color = instance_p.color;
	
	return output;
}
//===================================================================
...
Edited by Icebone1000

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this