Particles: Batching VS instancing

Started by
13 comments, last by noodleBowl 9 years, 4 months ago

I was wondering what you guys think the best course of action would be

I'm looking to making a particle system. Currently I have a sprite batcher that can handle rotations, scaling, and etc. It works by mapping a VBO with vertex data I need for the sprites I want to draw. Now I'm wondering should I just build my particle system on top of this? Or should I create a whole new system that uses instancing? Would a instance system even be better, would it even help? I ask because I really do not know that much about instancing

Does any of this change if want to be able to do effects such as rain, snow, fire, smoke?

What if I want the particles to be textured, eg each particle is a different snowflake image?

I'm targeting OpenGL 3.3+ just as fyi too

Advertisement
Instancing and batching work together in most cases.

With instancing, you'd still need to fill a buffer with any per-instance particle data. Instancing just removes the need for you to compute all the vertices of each particle on the CPU.

You can combine instancing with a geometry shader in order to further cut down on processing. In such a setup, you'd still just generate and send the instance data to the GPU, but then you'd run the vertex shader once per particle on that instance data and use the geometry shader to generate the rest of the vertices for each particle.

Either with or without instancing, you can also use transform feedback or a compute shader to do all of the per-instance updates completely on the GPU.

Sean Middleditch – Game Systems Engineer – Join my team!

See Vertex Shader Tricks by Bill Bilodeau.

The techniques apply to both D3D10/11 and OpenGL3+ as well.

So if I make them separate (my batcher and the particle system), I hope this makes sense, then I'm looking at

1 static buffer for my vertex coords [5 floats (x,y,z, uText, vText) * 4 (since it is a quad)] thats filled at init

1 static buffer for my indices [ 6 floats]

1 dynamic buffer for my particle positions [2 floats (x,y) * the max number of particles] that is filled/changes every frame

Then I can use the glDrawElementsInstanced call to draw the number of particles to the screen

That makes sense right?

If you read or re-read the slides I posted, that is the least recommended approach as they benchmark showed it's the slowest version. You should bench on your own to verify their results.

Their recommendation was to set:

No vertex buffer.

1 static buffer for your indices [6 * MaxNumOfParticlesPerDraw uint16; filled with zeroes]

1 dynamic UBO or TBO for your particle position [2 floats(x, y) * max number of particles].

Use gl_VertexId in the vertex shader to construct the quad.

Use glDrawElements. If you use glDrawArrays, you can avoid the index buffer (but usually comes with the overhead of having to switch between arrays and elements inside the GPU or the driver, since most geometry is indexed).

So my question is should I even bother with a new system?

My current sprite batcher system is already dynamic and uses glDrawElementsBaseVertex to get things done.

It made sense to make a new system if I was going to use instancing, but since you and the slides say don't use DrawInstanced.

Making a particle system based on instancing does not make any sense

Instancing is likely more performant than batching (sending 1 vertex instead of 4 per particle). It's also a good exercise - that's how I learned instancing with DX9 wink.png.

Edit: You can also delegate more to the GPU, in particular: Grant your vertex a scale and rotation and do that in the vertex shader.

I normally do scale and rotation on the CPU side, which is the case with my batcher. But for particles maybe it makes more sense to be more GPU heavy?

Also how does one even construct a quad in the vertex shader? I'm pretty bad when it goes beyond the point of default or simple shaders.
I feel like what I'm currently thinking might be a "bad" way to do it
#version 330 core

in vec2 xyPosition;
out vec2 UV;
uniform mat4 MVP; //orthographic projection matrix passed in

void main()
{
	//For 1x1 sized particle
	vec4 vertexPosition;
	int vertex = gl_VertexId % 4;
	if(vertex == 0)
	{
		vertexPosition.x = 0;
		vertexPosition.y = 0;
	}
	else if(vertex == 1)
	{
		vertexPosition.x = 0;
		vertexPosition.y = 1;
	}
	else if(vertex == 2)
	{
		vertexPosition.x = 1;
		vertexPosition.y = 1;
	}
	else if(vertex == 3)
	{
		vertexPosition.x = 1;
		vertexPosition.y = 0;
	}
	vertexPosition.z = 0;
	vertexPosition.w = 1;
	
	gl_Position = MVP * vertexPosition;
	UV = uvPosition;
}

1 dynamic UBO or TBO for your particle position [2 floats(x, y) * max number of particles].
Use gl_VertexId in the vertex shader to construct the quad.


Also another question, I really don't understand why you say make a UBO for XY positions of the particles. Why not just another VBO?
Is it because of a GPU memeory placement thing? Because I would need VAO for it, where I can get away with not having one if I use a UBO? Something like that?

I'm asking because do not know the true difference between a VBO and UBO, other than a VBO is traditionally for vertex data and UBO is for
sending multiple uniforms to a shader in one go.

Which is something I don't consider, dynamic XY positions that is, to be a uniform item

I really don't understand why you say make a UBO for XY positions of the particles. Why not just another VBO?Is it because of a GPU memeory placement thing? Because I would need VAO for it, where I can get away with not having one if I use a UBO?

Because if you use a vbo, you would need 4 vertices per particles, but the ubo can just hold one entry (the position) per particle.
Your understanding of vbos and ubos is correct. However this is like a hack that works really well. And it works well because GPUs are getting more and more similar to CPUs, and vbos and ubos is just memory getting fetched, that the api put arbitrary restrictions that are no longer necessary.

By setting a null vbo and draw 4 vertices (1 particle), you're basically telling the api "draw 4 vertices of nothing" (aka iterate the vertex shader 4 times with no vertex data) but the vertex shader can fetch data from a different location (the ubo) with the help of gl_vertexid to determine the index in the ubo and the vertex position to generate.

By setting a null vbo and draw 4 vertices (1 particle), you're basically telling the api "draw 4 vertices of nothing" (aka iterate the vertex shader 4 times with no vertex data) but the vertex shader can fetch data from a different location (the ubo) with the help of gl_vertexid to determine the index in the ubo and the vertex position to generate.


I hope I can explain this well enough, so please bear with me and let me know if this doesn't makes sense.
Now regardless of VBO or UBO use, does that mean I need to "double up" on my XY positions since the
vertex shader is run 4 times per quad i.g:

//Example contents of my buffer for 2 quads

const int FLOATS_PER_QUAD = 8; // 4 vertex * 2 (x, y positions)
int maxParticles = 2;
GLfloat *vertex = new GLfloat[maxParticles * FLOATS_PER_QUAD];

//Quad 1 at position 50x50
vertex[0] = 50.0f; //X pos
vertex[1] = 50.0f; //Y pos
vertex[2] = 50.0f; 
vertex[3] = 50.0f; 
vertex[4] = 50.0f; 
vertex[5] = 50.0f; 
vertex[6] = 50.0f; 
vertex[7] = 50.0f; 

//Quad 2 at position 25x40
vertex[8] = 25.0f; //X pos
vertex[9] = 40.0f; //Y pos
vertex[10] = 25.0f; 
vertex[11] = 40.0f; 
vertex[12] = 25.0f; 
vertex[13] = 40.0f; 
vertex[14] = 25.0f; 
vertex[15] = 40.0f; 


I hope the above is not the case, because that seems silly to have repeat the same data over and over.
When really this should be enough:

//The contents of my buffer for 2 quads

const int POSITION_COMP_COUNT = 2; // For XY position
int maxParticles = 2;
GLfloat *vertex = new GLfloat[maxParticles * POSITION_COMP_COUNT];

//Quad 1 at position 50x50
vertex[0] = 50.0f; //X pos
vertex[1] = 50.0f; //Y pos

//Quad 2 at position 25x40
vertex[3] = 25.0f; //X pos
vertex[4] = 40.0f; //Y pos


But I am unsure. How do you tell a shader "do not continue onto the next positions set, until you have ran 4 times"?
Is this solved with the use of a UBO (I ask cause I have never used one)?
I know there is glVertexAttribDivisor for VBOs, which will say only consider the attribute X times for a instance.
But with that being said I beleive this will only work with DrawInstanced calls, which is what we are trying to avoid.

This topic is closed to new replies.

Advertisement